CHANGED-GOAL OR CUE-STRENGTHENING? EXAMINING JUDGMENT OF 
LEARNING REACTIVITY THROUGH THE LENS OF THE DUAL-RETRIEVAL 
MODEL 
 
 
 
 
 
 
 
A Dissertation 
Presented to the Faculty of the Graduate School 
of Cornell University 
In Partial Fulfillment of the Requirements for the Degree of 
Doctor of Philosophy 
 
 
 
 
 
 
by 
Minyu Chang 
May 2022  
 
 
 
 
 
 
 
 
 
 
 
 
 
© 2022 Minyu Chang
 
 
CHANGED-GOAL OR CUE-STRENGTHENING? EXAMINING JUDGMENT OF 
LEARNING REACTIVITY THROUGH THE LENS OF THE DUAL-RETRIEVAL 
MODEL 
 
Minyu Chang, Ph. D.  
Cornell University 2022 
 
Recent evidence suggests that making judgments of learning (JOLs) can directly 
modify subsequent memory performance, which is referred to as JOL reactivity.  The 
present dissertation examined the underlying mechanism of JOL reactivity by (a) 
testing the two major theoretical explanations for JOL reactivity: the changed-goal 
hypothesis and the cue-strengthening hypothesis, and (b) pinpointing the retrieval 
processes that are modified by JOLs with the implementation of the dual-retrieval 
model.  Here, the changed-goal hypothesis assumes that JOLs highlight the difference 
in learning difficulties among to-be-remembered items and switch learners’ goals from 
mastering all items to focusing more on easier items at the expense of harder items, 
thus producing negative reactivity for the latter.  The cue-strengthening hypothesis 
posits that the act of making JOLs strengthens the cues that inform JOLs, thus 
producing positive reactivity when later memory tests are sensitive to the strengthened 
cues.  In Experiment 1, I compared the reactive effects of item-level JOLs on 
associative recall between three types of word pairs that differ in learning difficulty: 
strongly related, weakly related, and identical pairs.  In Experiment 2, I tested whether 
prestudy JOLs produced similar reactive effects as immediate JOLs on associative 
recall for related pairs.  In Experiment 3, I investigated whether JOL reactivity was 
moderated by inter-item relation (word pairs whose targets were either semantically 
iii 
 
related or unrelated), JOL type (item-level or list-level), and test format (associative or 
free recall).  In Experiment 4, I inspected whether reactivity of item-level and list-level 
JOLs was moderated by list organization (blocked or randomized) in free recall for 
categorical word lists.  The experiments offered converging support for the cue-
strengthening hypothesis rather than for the changed-goal hypothesis.  Moreover, 
although positive JOL reactivity was always accompanied by improvements in 
recollection for item-specific verbatim details, it was also sometimes supported by 
enhancements in non-recollective operations (reconstruction and familiarity).  
Particularly, the process-level mechanism of JOL reactivity varied with material type, 
JOL type, and test format, which is consistent with the cue-strengthening hypothesis.  
Last, a contextual framework was recommended for further investigations into JOL 
reactivity. 
  
iv 
 
BIOGRAPHICAL SKETCH 
Minyu Chang was born on October 1st, 1995 in Hunan Province, China.  In 
2017, she received a Bachelor of Social Science degree (with first honor distinction) 
from the University of Hong Kong, where she majored in Psychology and minored in 
Cognitive Science.  In the same year, she joined Dr. Charles Brainerd’s Memory and 
Neuroscience Lab as a Ph. D. student.  During her doctorate program, her research 
revolves around three connected topics: episodic memory, metamemory, and cognitive 
aging, with an overarching goal to understand the cognitive and metacognitive 
processes that govern learning and memory and the developmental changes in these 
processes across the human life span.  Specifically, she has implemented behavioral 
experimentation and mathematical modeling to understand the semantic factors that 
affect memory processes, the metacognitive processes that regulate learning, and the 
developmental and disease trajectories in cognition during late adulthood.  The present 
dissertation is part of her work in metamemory. 
v 
 
ACKNOWLEDGMENTS 
First, I would like to thank my supervisor, Dr. Charles Brainerd, for his 
wonderful mentorship not only for my dissertation work but throughout the doctorate 
program.  Dr. Brainerd has offered me all I expect from an advisor: He provides me 
with freedom and guidance to explore my own research ideas and he is always 
available and supportive when I need help.  I have learned so much from him about 
how to be a researcher, a lab leader, and a supervisor.  I would also like to thank my 
committee members Dr. Valerie Reyna and Dr. Adam Anderson, who have offered me 
generous support and helpful feedback.  Additionally, I would like to express thanks to 
the undergraduate research assistants in the Memory and Neuroscience Lab.  The 
experiments would not have been possible without their tremendous help in material 
preparation and data organization.   
Finally, I would like to extend my deepest gratitude to my father Chun Chang, 
for raising me and educating me so I can be who I am now and for always valuing my 
health and happiness more than my achievements.  Also, I am immensely grateful to 
my mother Jie Liu, who brought me into the world and offered me unconditional love.  
I will always miss her, and I hope I have made her proud.  I would also like to thank 
my boyfriend Yucheng Zhu, for offering a sympathetic ear during stressful times and 
for always having faith in me even when I failed to do so. 
  
vi 
 
TABLE OF CONTENTS 
 
BIOGRAPHICAL SKETCH .......................................................................................... v 
ACKNOWLEDGMENTS ............................................................................................. vi 
TABLE OF CONTENTS ............................................................................................. vii 
LIST OF FIGURES  ....................................................................................................... x 
LIST OF TABLES  ....................................................................................................... xi 
CHAPTERS 
CHAPTER 1. INTRODUCTION ................................................................................... 1 
JOLs as Metamemory Judgments and Memory Modifiers ........................................ 2 
Theoretical Explanations of JOL Reactivity .............................................................. 7 
Change-Goal Hypothesis ........................................................................................ 7 
Cue-Strengthening Hypothesis ............................................................................. 10 
Significance and Implications of JOL Reactivity ..................................................... 15 
The Dual-Retrieval Model ........................................................................................ 17 
An Overview of Present Experiments ...................................................................... 22 
CHAPTER 2. EXPERIMENT 1................................................................................... 26 
Method ...................................................................................................................... 27 
Participants ........................................................................................................... 27 
Materials  .............................................................................................................. 27 
Procedure .............................................................................................................. 27 
Results ...................................................................................................................... 29 
ANOVA Results for JOLs ................................................................................ 29 
ANOVA Results for Recall .............................................................................. 29 
Model Results ................................................................................................... 30 
Discussion................................................................................................................. 32 
CHAPTER 3. EXPERIMENT 2................................................................................... 36 
Method ...................................................................................................................... 38 
vii 
 
Participants ........................................................................................................... 38 
Materials  .............................................................................................................. 39 
Procedure .............................................................................................................. 40 
Results ...................................................................................................................... 41 
ANOVA Results for JOLs ................................................................................ 41 
ANOVA Results for Recall .............................................................................. 42 
Model Results ................................................................................................... 44 
Discussion................................................................................................................. 46 
CHAPTER 4. EXPERIMENT 3................................................................................... 49 
Method ...................................................................................................................... 52 
Participants ........................................................................................................... 52 
Materials  .............................................................................................................. 52 
Procedure .............................................................................................................. 53 
Results ...................................................................................................................... 54 
ANOVA Results for JOLs ................................................................................ 54 
ANOVA Results for Associative Recall .......................................................... 56 
ANOVA Results for Free Recall ...................................................................... 58 
Model Results for Associative Recall .............................................................. 60 
Model Results for Free Recall .......................................................................... 62 
Discussion................................................................................................................. 63 
CHAPTER 5. EXPERIMENT 4................................................................................... 69 
Method ...................................................................................................................... 71 
Participants ........................................................................................................... 71 
Materials  .............................................................................................................. 71 
Procedure .............................................................................................................. 72 
Results ...................................................................................................................... 73 
ANOVA Results for JOLs ................................................................................ 73 
ANOVA Results for Recall .............................................................................. 74 
Model Results ................................................................................................... 77 
viii 
 
Discussion................................................................................................................. 79 
CHAPTER 6. GENERAL DISCUSSION .................................................................... 83 
Summary of Main Methodologies, Hypotheses, and Behavioral Findings .............. 83 
Process-Level Mechanisms for JOL Reactivity ....................................................... 88 
Theoretical Implications and Future Directions ....................................................... 93 
A Contextual Framework for Understanding JOL Reactivity .......................... 93 
Implications for Other Encoding Tasks ............................................................ 97 
Questions That Remain to Be Answered.......................................................... 99 
Concluding Comments ........................................................................................... 102 
References .................................................................................................................. 103 
Appendix A ................................................................................................................ 116 
Appendix B ................................................................................................................. 118 
Appendix C ................................................................................................................. 119 
Appendix D ................................................................................................................ 121 
Appendix E ................................................................................................................. 123 
 
ix 
 
LIST OF FIGURES 
 
Figure 2.1. Associative Recall in Experiment 1 ........................................................... 30 
Figure 3.1. JOLs in Experiment 2 ................................................................................ 42 
Figure 3.2. Associative Recall in Experiment 2 ........................................................... 44 
Figure 4.1. An Overview of the Experiment Design of Experiment 3  ........................ 54 
Figure 4.2. Associative Recall in Experiment 3  .......................................................... 58 
Figure 4.3. Free Recall in Experiment 3  ...................................................................... 60 
Figure 5.1. Free Recall in Experiment 4  ...................................................................... 75 
 
x 
 
LIST OF TABLES 
Table 1.1. A Summary of the Two Major Theoretical Explanations ........................... 14 
Table 1.2. Definitions for the Dual-Retrieval Model Parameters................................. 20 
Table 1.3. A Summary of Theoretical Predictions for Experiments 1-4 ...................... 25 
Table 2.1. Dual-Retrieval Model Fits and Parameter Estimates for Experiment 1 ...... 31 
Table 3.1. Dual-Retrieval Model Fits and Parameter Estimates for Experiment 2 ...... 45 
Table 4.1. Dual-Retrieval Model Fits and Parameter Estimates for Experiment 3 ...... 62 
Table 5.1. Dual-Retrieval Model Fits and Parameter Estimates for Experiment 4 ...... 78 
Table 6.1. A Summary of Designs and Recall Findings for Experiments 1-4 ............. 88 
Table 6.2. A Summary of Dual-Retrieval Model Findings for Experiments 1-4 ......... 90
xi 
 
CHAPTER 1 
INTRODUCTION 
Judgments of learning (JOLs) refer to people’s predictions of their future memory 
performance for the currently studied materials, which reflect their metacognitive monitoring of 
their own learning processes.  Traditionally, JOLs were assumed to only assess but not alter the 
underlying learning processes.  However, accumulating research has demonstrated that the 
solicitation of JOLs can directly modify subsequent memory performance (Janes et al., 2018; 
Mitchum et al., 2016; Myers et al., 2020; Rivers et al., 2021; Senkova & Otani, 2021; 
Soderstrom et al., 2015; Tekin & Roediger, 2020; Witherby & Tauber, 2017b; Yang et al., 2015; 
Zhao et al., 2021).  Such memory effects are referred to as JOL reactivity (for reviews, see 
Double et al., 2018; Double & Birney, 2019), where positive JOL reactivity means that the 
solicitation of JOLs improves subsequent memory and negative JOL reactivity means that the 
solicitation of JOLs impairs subsequent memory. 
Several explanations have been proposed for JOL reactivity, among them the most 
widely studied are the changed-goal hypothesis (Mitchum et al., 2016) and the cue-strengthening 
hypothesis (Soderstrom et al., 2015).  However, given the recency of the two hypotheses, their 
predictive and explanatory power remains to be determined.  The primary aim of my dissertation 
was to test the predictions of the changed-goal hypothesis and the cue-strengthening hypothesis 
in a series of hypothesis-driven experiments.  Another objective was to investigate the process-
level mechanism for JOL reactivity.  To achieve this, I used the dual-retrieval model to pinpoint 
the specific retrieval processes that are modified by the solicitation of JOLs. 
In the current chapter, I first provide a brief review of how JOLs were traditionally 
studied and the recent evidence of JOL reactivity.  Then, I discuss the major theoretical accounts 
1 
 
for JOL reactivity and the significance of studying this effect.  After that, I explain the dual-
retrieval model, which was used in the present dissertation to estimate the underlying retrieval 
processes.  Last, I provide an overview of the four experiments in the present dissertation.  
JOLs as Metamemory Judgments and Memory Modifiers 
In the classic monitoring and control framework of metacognition (T. O. Nelson & 
Narens, 1990), cognitive processes are split into two interrelated levels.  One is the meta-level, 
and the other is the object-level.  The two levels are interrelated in that information flows from 
the object-level to meta-level via monitoring processes, and based on the information inputs, the 
meta-level in turn regulates the object-level via control processes.  Accordingly, metacognitive 
monitoring is vital for learning performance as it can guide learners to allocate study time, 
regulate cognitive recourses, and revise study strategies (Dunlosky & Ariel, 2011; Kornell & 
Bjork, 2008; Metcalfe & Finn, 2008).   
Metacognitive monitoring is usually measured by introspective self-reports.  JOLs are a 
typical example of such methods, which ask participants to make self-assessments of learning 
outcomes during encoding.  The most common form of JOLs is item-level JOLs.  To illustrate, 
participants typically study a series of single words, word pairs, or other materials, and they are 
asked to make a judgment at the end of each study trial regarding the likelihood of remembering 
the item on a later memory test.  Another less studied form of JOLs is aggregate JOLs (e.g., 
Mazzoni & Nelson, 1995; Stevens & Pierce, 2019), in which participants are asked to provide a 
global assessment for a set of studied materials, such that how many items from the prior set they 
expect to remember on a later memory test.  Unless otherwise specified, “JOLs” refer to item-
level JOLs throughout this chapter. 
2 
 
A tacit assumption underlying the use of introspective self-report methods is that they 
merely monitor a given cognitive process without affecting it (Soderstrom et al., 2015).  In other 
words, there should be no reactivity.  In the context of JOLs, it is often implicitly assumed that 
the solicitation of JOLs should not modify the monitoring processes measured by JOLs and 
hence should not have any direct effect on later memory performance.  However, this assumption 
seems to face a challenge from a large body of literature on common encoding tasks (e.g., deep 
processing, survival processing etc.).  For instance, research on the level-of-processing effect has 
shown that asking participants to make judgments regarding the semantic content of study 
materials (deep processing; e.g., rating pleasantness of each word on a numeric scale) can 
produce robust memory benefits (Bower et al., 1974; Craik & Tulving, 1975).  Similarly, 
research on survival processing indicates that requiring participants to rate vocabulary words for 
their relevance to a survival scenario (e.g., trapped in a grassland) results in better retention for 
those words (Nairne et al., 2007).  Note that there is considerable similarity between JOLs and 
these common encoding tasks:  They all require participants to make certain judgments about the 
study materials during encoding.  Thus, there is reason to suspect that JOLs, like the other 
encoding tasks, can directly affect subsequent memory performance.  
Indeed, the findings of some early JOL studies were in opposition to the no-reactivity 
assumption.  Arbuckle and Cuddy (1969; Experiment 2) found that recall for word pairs was 
better for participants who were asked to make JOLs than for participants who were not required 
to make JOLs.  Furthermore, King et al. (1980) reported that after a series of study trials for 
word pairs, participants who made JOLs and participants who had an opportunity to restudy the 
word pairs displayed comparable memory performance on the final test.  This suggests that 
making JOLs can similarly enhance later memory performance as additional study trials. 
3 
 
Among later studies that include both a JOL condition and a no-JOL control condition, 
some did not find evidence for JOL reactivity (Ariel et al., 2021; Benjamin et al., 1998; 
Dougherty et al., 2018; Kelemen & Weaver III, 1997; Kornell & Bjork, 2008; Tauber & Rhodes, 
2012), whereas many others did (Dougherty et al., 2005; Janes et al., 2018; Mitchum et al., 2016; 
Myers et al., 2020; Rivers et al., 2021; Senkova & Otani, 2021; Soderstrom et al., 2015; Tauber 
& Witherby, 2019; Tekin & Roediger, 2020; Witherby & Tauber, 2017b; Yang et al., 2015; 
Zechmeister & Shaughnessy, 1980; Zhao et al., 2021).  Double et al. (2018) reported a meta-
analysis for 17 experiments of this sort.  Their results showed that there was a moderate positive 
JOL reactivity for related word pairs and single-word lists, but no reactivity for unrelated word 
pairs.  However, as Double et al. noted, caution should be taken in interpreting the results of 
word-list experiments, as only three experiments using word lists were included in the meta-
analysis.   
It should be noted that many studies cited above were not designed to evaluate JOL 
reactivity, which leads to a lack of methodological standardization that may be responsible for 
the mixed findings.  For example, Dougherty et al. (2005, 2018) administered a recall test prior 
to the solicitation of JOLs, so participants already had test experience with the to-be-tested word 
pairs when making JOLs.  Additionally, in Tauber and Rhodes (2012), JOLs were always 
followed by restudy choices.  Similarly, in Kornell and Bjork (2008), JOLs were made after 
“drop” decisions (i.e., put aside and stop studying the items that one has already known) and 
were only solicited for the dropped items.  These studies all had retrieval- or judgment-type tasks 
administered before JOLs, which can potentially mask JOL reactivity.  Thus, a systematic 
investigation into JOL reactivity requires more focused experimentation.  
4 
 
Here, Mitchum et al. (2016) and Soderstrom et al. (2015) are two of the most influential 
studies that systematically investigated JOL reactivity.  They proposed two different theoretical 
explanations for JOL reactivity based on their findings, which are discussed in the next section.  
Mitchum et al. (2016) found that JOLs function as a memory modifier primarily through 
negative reactivity.  In their first three experiments, JOL solicitation and cue-target relation of 
word pairs were factorially manipulated, with the former being a between-subject manipulation 
and the latter being a within-subject manipulation.  Mitchum et al. demonstrated that when 
related and unrelated word pairs were studied in a mixed list, the solicitation of JOLs weakened 
the correlation between cue-target relatedness and self-paced study time, suggesting that the 
overall tendency to allocate less study time to related pairs and more study time to unrelated 
pairs was reduced relative to when no JOLs were requested.  Consequently, the discrepancy in 
memory performance between related and unrelated pairs was increased in the JOL condition 
compared to in the no-JOL condition, which was largely driven by negative JOL reactivity for 
unrelated pairs.  
In addition, Mitchum et al. showed in Experiment 4 that negative JOL reactivity 
disappeared when participants studied a pure list of unrelated word pairs, where there were no 
salient cues for relative item difficulty.  Moreover, in Experiment 5, when study time was 
experimenter-paced rather than self-paced as in Experiments 1-4, the difference in memory 
performance between related and unrelated pairs was still larger in the JOL condition relative to 
the no-JOL condition.  Therefore, the gist of Mitchum et al.’s findings is that making JOLs 
produces greater discrepancy in memory performance between related and unrelated word pairs, 
which is mainly caused by negative reactivity for unrelated word pairs.  Mitchum et al. 
hypothesized that negative JOL reactivity arises out of participants’ adjustment of study strategy 
5 
 
based on relative item difficulty.  That is, when prompted to make JOLs, they tend to allocate 
less time to relatively hard items and spend more time on easily remembered items compared to 
when JOLs were not solicited.  
On the contrary, Soderstrom et al. (2015) reported that JOLs modify memory mainly 
through positive reactivity.  In Soderstrom et al. (2015; Experiment 1a & 1b), they similarly 
manipulated JOL solicitation between subjects and cue-target relation of word pairs within 
subjects.  Those authors found that recall for strongly related word pairs was enhanced in the 
JOL condition compared to in the no-JOL condition.  However, there was no difference in recall 
for weakly related or unrelated pairs between the two JOL conditions.  In their Experiment 2, 
Soderstrom et al. manipulated JOL solicitation between subjects and generation conditions (read 
versus generation) within subjects.  Specifically, in the JOL condition, participants made JOLs 
only for the read items but not for the generated items.  It turned out that the presence of JOLs 
attenuated the difference in recall between read and generated items but did not eliminate it.  To 
sum up, the main takeaway of Soderstrom’s results is that making JOLs produces positive 
reactivity for strongly related word pairs, which is probably because JOLs enhance the 
processing of cue-target relation, similar to the generation task. 
Thereafter, positive JOL reactivity was replicated with different study materials, test 
formats, experimental manipulations, and populations.  In terms of study materials, positive JOL 
reactivity was found not only with related word pairs but also with categorized word lists 
(Senkova & Otani, 2021).  In terms of test formats, it was established with recognition tests 
(Myers et al., 2020) and delayed testing (Witherby & Tauber, 2017b).  In terms of experimental 
manipulations, positive JOL reactivity turned out to be robust when JOL solicitation was 
manipulated within subjects (Rivers et al., 2021; Yang et al., 2015) and within the depth-of-
6 
 
processing paradigm (Tekin & Roediger, 2020).  Moreover, positive JOL reactivity was recently 
found in elementary school children (Zhao et al., 2021), too.  On the other hand, negative JOL 
reactivity was also observed in a few other studies (Double, 2019; Janes et al., 2018; Rivers et 
al., 2021; Schäfer & Undorf, 2021), although it only approached conventional significance in 
Janes et al. (2018). 
Theoretical Explanations for JOL Reactivity 
Previously, some speculations have been made that JOL reactivity may simply arise from 
the extended study time (King et al., 1980; Rhodes, 2016).  This possibility was ruled out by the 
subsequent replications of JOL reactivity when study time was controlled between the JOL and 
no-JOL conditions (e.g., Janes et al., 2018; Myers et al., 2020; Soderstrom et al., 2015).  Thus 
far, multiple theoretical hypotheses have been proposed to explain how JOLs modify people’s 
processing of to-be-remembered materials and ultimately affect subsequent memory 
performance.  Below I discuss two major hypotheses for JOL reactivity: the changed-goal 
hypothesis (Mitchum et al., 2016) and the cue-strengthening hypothesis (Soderstrom et al., 
2015).  A summary of the core ideas and supporting and opposing evidence for the two 
hypotheses were presented in Table 1.1. 
Changed-Goal Hypothesis 
The changed-goal hypothesis (Mitchum et al., 2016) posits that the presence of JOLs will 
amplify the perception of differences in learning difficulty among the to-be-remembered items, 
making people switch their learning goals from mastery-oriented to performance-oriented.  
According to the discrepancy-reduction model of self-regulated learning (Dunlosky & Hertzog, 
1998), people tend to adopt mastery as a goal in normal learning situations, where they allocate 
more time studying harder items than studying easier items.  However, when participants are 
7 
 
prompted to make JOLs, they gain a heightened awareness of the fact that some items are more 
likely to be remembered than others, and hence they may switch their goals from mastering as 
many items as possible to focusing on remembering the relatively easy items, namely, a 
performance-oriented goal.  According to the region of proximal learning framework (Metcalfe 
& Kornell, 2005), adopting a performance-oriented goal drives people to allocate more resources 
to remembering relatively easy and moderately challenging items at the expense of most difficult 
items.  Therefore, the changed-goal hypothesis predicts that making JOLs will increase the 
discrepancy in memory performance between easier and harder items, which is largely driven by 
negative JOL reactivity for the latter.   
Here, it is clear that the changed-goal hypothesis predicts negative JOL reactivity for 
relatively difficult items.  Does it predict any JOL reactivity for relatively easy items?  Mitchum 
et al. (2016) did not detect such an effect.  However, based on the rationale behind the changed-
goal hypothesis, namely, the switch in learning goals prompts people to emphasize learning of 
easier items at the cost of harder items, it should also expect positive reactivity for relatively 
easier items.  This is many other researchers’ interpretation of the changed-goal hypothesis, too 
(e.g., Myers et al., 2020; Tekin & Roediger, 2020).  Still, negative JOL reactivity for relatively 
difficult items offers stronger support for the changed-goal hypothesis, because it is the most 
distinctive feature of this hypothesis. 
The changed-goal hypothesis has received support from some experimental results.  For 
instance, in Janes et al.’s (2018) Experiment 1, they used a between-subject manipulation of JOL 
solicitation and a within-subject manipulation of cue-target relation of word pairs, similar to both 
Mitchum et al. (2016) and Soderstrom et al. (2015).  Importantly, they added another 
manipulation that study was self-paced for half of the participants and experimenter-paced for 
8 
 
the other half.  Their results showed that JOL reactivity was attenuated in the self-paced 
condition compared to the experimenter-paced condition.  This was consistent with the changed-
goal hypothesis, because participants in the experimenter-paced condition were more likely to 
focus more on easier items at the expense of harder items given that they only had limited study 
time, while participants in the self-paced condition should be less likely to switch from a 
mastery-oriented goal to a performance-oriented goal since they had unlimited study time to 
master all items (Son & Metcalfe, 2000).  Further, in Experiment 2, they found that JOL 
reactivity was only reliable when participants studied a mixed list of related and unrelated pairs, 
but not when they studied a pure list of related pairs or unrelated pairs.  This was again 
consistent with the changed-goal hypothesis, according to which there should be no switch in 
learning goals in the absence of salient cues for relative item difficulty.  However, Janes et al. 
only offered partial support for the changed goal hypothesis, as they did not replicate the 
decreased correlation between pair relatedness and study time. 
Moreover, findings from perceptual disfluency research provided some indirect support 
for the changed-goal hypothesis.  Perceptually disfluent items (e.g., backward masked, cursive 
font, smaller font size, blurred) are usually given lower JOLs than items presented in a normal 
format, as they are perceived to be harder to remember.  When no JOLs are elicited, such 
materials have been found to improve later memory performance, possibly by provoking 
elaborative processing.  However, the mnemonic effect of perceptual disfluency was wiped out 
by the presence of JOLs (Besken & Mulligan, 2013; Geller, 2017; Halamish, 2018; Rosner et al., 
2015).  This seems to be consistent with the changed-goal hypothesis’ prediction that JOLs 
prompt participants to enhance processing for items that are perceived to be relatively easy (i.e., 
fluent items) and reduce processing for items that are perceived to be relatively difficult (i.e., 
9 
 
disfluent items), which thus erase the benefits of elaborative processing provoked by perceptual 
disfluency.   
Nevertheless, Tekin and Roediger’s (2020) recent findings undercut the change-goal 
hypothesis.  These authors reported that the level-of-processing effect was attenuated in the JOL 
condition compared to the no-JOL condition.  Specifically, JOLs improved recognition for items 
in both the shallow (phonetic-oriented) and deep (semantic-oriented) processing tasks.  However, 
JOL reactivity was significantly larger for items in the shallow processing task than in the deep 
processing task, even though the former items were perceived as harder to remember (i.e., lower 
JOLs) than the latter ones.  This was contrary to what the changed-goal hypothesis predicts, as 
the hypothesis assumes relatively easy items (deeply processed items) should benefit more from 
making JOLs than relatively hard items (shallowly processed items).   
In addition, Ikeda et al. (2016) provided some evidence against the changed-goal 
hypothesis, too.  These authors instructed participants to study four types of word pairs 
(unrelated, weakly related, strongly related, and identical pairs) under either performance-
oriented or mastery-oriented instructions.  They found that there was no difference in recall 
performance between the performance- and mastery-oriented groups.  Importantly, there was no 
interaction between word pair type and goal orientation in either study time or recall 
performance, suggesting that different goal orientations did not modify participants’ study time 
allocation or memory performance.  This is clearly in opposition to the changed-goal 
hypothesis’s assumption that JOL reactivity results from participants’ changing learning goals 
from master-oriented to performance-oriented. 
Cue-Strengthening Hypothesis 
10 
 
Soderstrom et al. (2015) proposed another hypothesis, the cue-strengthening hypothesis, 
which was developed based on the cue-utilization framework of JOLs (Koriat, 1997) and the 
transfer-appropriate multifactor account of generation effects (de Winstanley et al., 1996).  The 
cue-utilization framework (Koriat, 1997) suggests that JOLs are made based on a variety of cues 
that are available during encoding.  Specifically, Koriat theorized that there are three types of 
cues: intrinsic cues that are embedded in the to-be-remembered items (e.g., concreteness, word 
relatedness, etc.), extrinsic cues that are concerned with learning conditions or encoding 
processes applied by the learners (e.g., presentation duration, repetition, etc.), and mnemonic 
cues that are based on internal and subjective experience (e.g., processing fluency, familiarity, 
etc.).  Meanwhile, according to the transfer-appropriate multifactor account of generation effects, 
generation strengthens the information that is used in the generation task, and thus it improves 
memory performance when such information is useful in the later memory test (de Winstanley et 
al., 1996).  Combining those two accounts, the cue-strengthening hypothesis posits that JOLs 
enhance the cues that participants draw upon when making the judgments, and JOLs enhance 
subsequent memory performance if the later memory tests are sensitive to the strengthened cues. 
Myers et al. (2020) recently provided evidence supporting the cue-strengthening 
hypothesis.  In their four experiments, participants studied related or unrelated word pairs in the 
study phase and took either associative recall, free recall, or recognition test in the test phase.  
Participants were not told which test format would be administered in advance.  The results 
showed that JOLs displayed positive reactivity for related word pairs with associative recall and 
recognition tests, but not with free recall tests.  Myers et al. reasoned that it is because JOLs 
enhanced processing for item-specific cues, such as the relation between cue and target within a 
pair or specific features of the targets.  Because associative recall and recognition tests are both 
11 
 
sensitive to such cues, performance on these two types of memory tests is enhanced by JOLs.  
However, free recall tests are more sensitive to inter-item relations rather than item-specific cues, 
which makes JOLs less beneficial for free recall.  Thus, no reactivity was observed in free recall 
tests. 
On a related note, Senkova and Otani (2021) hypothesized that making JOLs enhanced 
memory by specifically strengthening item-specific cues.  In their experiments, Senkova and 
Otani factorially manipulated JOL condition (JOL, no-JOL) and list type (categorized, 
uncategorized) between subjects and found that JOLs enhanced free recall performance for 
categorical lists but not for uncategorical lists.  They explained that the positive reactive effects 
of JOLs arise from enhanced item-specific processing, which complements the relational 
processing promoted by categorical lists.  This explanation was backed up by their findings that 
the recall enhancement produced by JOLs was comparable to two classic manipulations that are 
known to induce item-specific processing (Experiment 1: pleasantness rating; Experiment 2: 
mental imagery).  Thus, they concluded that JOLs improve subsequent memory performance by 
specifically strengthening item-specific cues, which can be seen as an additional assumption that 
is proposed for the cue-strengthening hypothesis.   
Nevertheless, Mitchum et al.’s (2016; Experiment 1) results were not in compliance with 
the cue-strengthening hypothesis.  Mitchum et al. examined whether JOL reactivity varies as a 
function of the associative direction of the cue-target word pair, which can be either forward 
(from cue to target) or backward (from target to cue).  Obviously, associative recall favors 
forward association more than backward association, as participants are required to produce the 
target word when given the cue word.  Research has suggested that associative recall 
performance was indeed better for forward than for backward pairs, even though participants 
12 
 
made comparable JOLs for these two types of pairs (Koriat & Bjork, 2005; Maxwell & Huff, 
2021).  Therefore, given that associative recall was more sensitive to forward than backward 
association, the cue-strengthening hypothesis predicts stronger JOL reactivity for forward than 
for backward pairs.  Nevertheless, Mitchum et al. did not find any interaction between JOL 
condition (JOL, no-JOL) and associative direction (forward, backward), suggesting that JOL 
reactivity did not differ between forward and backward pairs.  
Meanwhile, as Rivers et al. (2021) commented, it would be hard for the cue-
strengthening hypothesis to accommodate negative JOL reactivity (Janes et al., 2018; Mitchum 
et al., 2016; Schäfer & Undorf, 2021) without further assumptions.  Additionally, the cue-
strengthening hypothesis would also have difficulty explaining why JOL reactivity disappeared 
when using a pure list of related or unrelated word pairs (Janes et al., 2018; Mitchum et al., 
2016).  However, in that connection, it is worth mentioning that positive JOL reactivity was 
successfully replicated with a pure list of related pairs in other studies (Tauber & Witherby, 
2019; Witherby & Tauber, 2017b).  Thus, it still requires further examination to determine 
whether JOL reactivity varies between mixed versus pure list design.
13 
 
Table 1.1 
A Summary of The Two Major Theoretical Explanations for Judgment of Learning Reactivity 
 
Theoretical Core Ideas Evidence 
Explanations 
Changed-goal Making JOLs Supporting evidence: 
hypothesis switches - Negative reactivity for unrelated word pairs (Janes et al., 2018; Mitchum et al., 2016; Schäfer & 
learners’ goals Undorf, 2021) 
from mastery- - Decreased correlation between item difficulty and study time (Mitchum et al., 2016; but see 
oriented to Janes et al., 2018) 
performance- - Attenuated JOL reactivity in the self-paced condition than in the experimenter-paced condition 
oriented, which (Janes et al., 2018) 
prompts them to - No JOL reactivity for either related or unrelated pairs in a pure-list design (Janes et al., 2018; 
focus more on Mitchum et al., 2016, but see Tauber & Witherby, 2019; Witherby & Tauber, 2017b) 
learning easier - The perceptual disfluency effect was eliminated in the presence of JOLs (Besken & Mulligan, 
items at the cost 2013; Geller, 2017; Halamish, 2018; Rosner et al., 2015) 
of harder items Opposing evidence: 
- Attenuated JOL reactivity in the deep processing condition than in the shallow processing 
condition (Tekin & Roediger, 2021) 
- Manipulating goal orientation did not affect study time or recall performance (Ikeda et al., 2016) 
Cue- Making JOLs Supporting evidence: 
strengthening strengthens the - Positive reactivity for related pairs but not for unrelated pairs (e.g., Soderstrom et al., 2015; 
hypothesis cues that inform Myers et al., 2021) 
JOLs, which - The generation effect was attenuated by the solicitation of JOLs (Soderstrom et al., 2015) 
benefits memory - Positive reactivity for related pairs only in associative recall and recognition but not in free recall 
performance if (Myers et al., 2021) 
memory test is - Positive reactivity for categorical lists but not for uncategorical lists (Senkova & Otani, 2021)  
sensitive to the Opposing evidence: 
strengthened - No difference in JOL reactivity between forward and backward associative pairs (Mitchum et al., 
cues 2016) 
- No JOL reactivity for related pairs in a pure-list design (Janes et al., 2018; Mitchum et al., 2016, 
but see Tauber & Witherby, 2019; Witherby & Tauber, 2017b) 
14 
 
Significance and Implications of JOL Reactivity 
Given that JOLs are one of the most commonly used measures in metacognition research, 
JOL reactivity is an important phenomenon both empirically and theoretically.  First of all, JOL 
reactivity often manifests itself as an interaction between JOL solicitation and other experimental 
manipulations.  This can lead to the consequence that the memory effects of those manipulations 
may be artificially magnified or reduced in the presence of JOLs.  On the one hand, JOLs clearly 
exaggerated the memory effects of pair relatedness as recall differences between related and 
unrelated pairs were inflated (Mitchum et al., 2016; Soderstrom et al., 2015).  On the other hand, 
it was observed that the memory benefits of perceptual disfluency could be wiped out by the 
presence of JOLs (Besken & Mulligan, 2013; Geller, 2017; Halamish, 2018; Rosner et al., 2015).  
Therefore, if researchers hope to estimate the “pure” memory effects of certain manipulations, 
they should carefully consider whether to include JOLs in the experiment design or consider 
including a control condition without JOLs. 
On a related note, JOLs sometimes strengthen the memory effects of certain factors but 
not the others when more than one factor is manipulated simultaneously.  For example, in 
Mitchum et al. (2016; Experiment 2), emotional valence and pair relatedness were factorially 
manipulated within subjects.  There was an interaction between pair relatedness (related, 
unrelated) and JOL condition (JOL, no-JOL), which was driven by the fact that recall for 
unrelated pairs was reduced in the JOL condition compared to the no-JOL condition.  However, 
there was no interaction between valence and JOL condition, which indicates that the valence 
effects remained the same regardless of whether JOLs were present.  Such preferential reactivity 
of JOLs would be especially problematic for research that aims at examining the interplay 
between multiple factors or identifying the predominant factors.  For instance, some studies have 
15 
 
manipulated both reward value and pair relatedness in word pairs (Ariel et al., 2009; Soderstrom 
& McCabe, 2011; Yu et al., 2020).  They found that both factors enhanced memory, but reward 
effects overrode the pair relatedness effects when the two factors were in conflict (i.e., high value 
assigned to unrelated pairs and low values assigned to related pairs).  However, all the 
aforementioned studies requested JOLs in the study phase.  Considering that JOL reactivity may 
favor particular factors over others when those factors are manipulated simultaneously, it 
remains an open question whether those conclusions still hold in the absence of JOLs. 
Meanwhile, JOL reactivity can also constrain the theoretical interpretation of certain 
findings on the relation between metacognitive judgment and memory performance.  In 
metacognition research, the correspondence between JOLs and actual memory performance is 
often of principal interest, because it serves as an index of monitoring accuracy.  The fact that the 
memory effects of certain manipulations vary as a function of JOL solicitation implies two 
possible scenarios: Asking participants to report JOLs either evokes additional metacognitive 
processing that would not be engaged spontaneously or brings such processing from 
unconsciousness to consciousness (Double & Birney, 2019).  Taking perceptual disfluency 
research as an illustration, disfluency manipulations have been demonstrated to induce a 
dissociation between JOLs and memory performance, which is interpreted as a metacognitive 
illusion (Besken & Mulligan, 2013; Castel, 2008; Yue et al., 2013).  However, the mere act of 
requesting JOLs from participants may make them engage in additional processing that 
counteracts the disfluency advantage.  Thus, researchers should be aware that the relation 
between metacognitive judgment and memory performance may be artificially altered when they 
use JOLs to measure metamemory judgment.  Accordingly, it is advisable that JOL reactivity can 
be incorporated into the current explanations for the relation between JOLs and memory. 
16 
 
Last, JOL reactivity also has potential educational implications.  For instance, it is 
common for textbook writers to add glossaries or short quizzes at the end of each section to help 
learners rehearse the content.  In that connection, recall that King et al. (1980) showed that 
making JOLs produced comparable memory benefits relative to restudying the items.  
Meanwhile, Ariel et al. (2021) showed that making JOLs after retrieval practices (i.e., short-
answer question) for educational materials led to even better performance in the later memory 
tests than having retrieval practices alone, suggesting that JOLs produced independent memory 
benefits from retrieval practice.  Thus, it may produce better learning outcomes if self-
assessments questions similar to JOLs are incorporated with the glossaries or short quizzes in 
textbooks.  Similarly, instructors can consider inserting JOL-like questions for key concepts in 
their lecture slides, which may improve students’ retention of those concepts. 
The Dual-Retrieval Model 
As discussed above, Senkova and Otani (2021) proposed that JOLs improve subsequent 
recall by strengthening item-specific processing.  To test this hypothesis, they compared the 
memory effects of JOLs to those of two other item-specific processing tasks (pleasantness rating 
and mental imagery).  Their finding showed comparable levels of memory enhancement for 
those three conditions, which was in line with their hypothesis.  However, this finding offers 
relatively weak support for their item-specific hypothesis, because it remains unknown whether 
the similar memory effects were due to similar underlying processes.  To answer that question, 
one will need to make process-level measurements and identify the specific processes that are 
modified by JOL solicitation.  To do that, I implemented the dual-retrieval model (Brainerd et 
al., 2009; Gomes et al., 2014) in the present dissertation, which is a tool that measures 
underlying retrieval processes for all conventional recall paradigms. 
17 
 
The dual-retrieval model is a Markov model developed based on fuzzy-trace theory’s 
(FTT) distinction between verbatim and gist traces (Brainerd & Reyna, 1998).  FTT posits that 
people separately encode and store two types of memory traces: verbatim traces of detailed 
surface content and other item-specific information and gist traces of semantic, elaborative, and 
relational content.  Verbatim traces support errorless recollective retrieval, in which the vivid, 
realistic surface details of an item’s prior presentation can be directly accessed and consciously 
reinstated.  For example, if the verbatim traces of a study word “sheep” are directly accessible, 
this word’s prior presentation would be vividly reinstated in consciousness, just as heard via the 
mind’s ears or seen via the mind’s eyes.   
Gist traces, instead, support non-recollective retrieval, in which subjects need to 
reconstruct items based on partially identifying information (typically semantic information) 
when verbatim traces cannot be directly accessed.  For example, if one cannot access the 
verbatim traces of the word “sheep” but remember the semantic gist of the word (e.g., certain 
four-footed animal on the farm), then some possible answers can be generated, such as horse, 
sheep, cow, goat, etc., from which a final answer can be selected and outputted.  Since people 
need to search through a set of candidate items that fit with the partial identifying information, 
reconstruction is potentially fallible.  Namely, people may reconstruct items that are never 
studied but are consistent with the semantic gist, thus leading to false recall.  To help rule out 
those distractors in the search set, a familiarity judgment operation is implemented before 
outputting the reconstructed item.  That is, only reconstructed items that exceed a certain 
familiarity threshold will be finally outputted.   
The dual-retrieval model is constructed to provide quantitative estimates of both 
verbatim-based recollective retrieval and gist-based non-recollective retrieval.  In the dual-
18 
 
retrieval model, the verbatim-based recollective operation is named direct access (D), the gist-
based non-recollective operation is labeled reconstruction (R), along with a salve operation of 
familiarity judgment (J).  The definitions of these parameters are presented in Table 1.2.  
Because the dual-retrieval model assumes that recall is controlled by either recollective or non-
recollective retrieval, the probability of successful recall in a single test trial is simply the sum of 
the probability of successful recollection plus the probability of successful non-recollective 
retrieval.  This can be expressed as a function of recollective and non-recollective parameters: 
p(C) = D + (1 - D)RJ, where C means correct recall.  Similarly, the probability of unsuccessful 
recall can be expressed as a function of both recollective and non-recollective parameters: p(E) = 
(1 - D)R(1 - J) + (1 - D)(1 - R), where E means recall error.   
However, with only a single test trial, which provides two empirical probabilities p(C) 
and p(E), an identifiability problem occurs.  Namely, there are not enough degrees of freedom to 
estimate three parameters.  For this reason, implementation of the dual-retrieval model always 
requires data from at least three standard recall tests.  In this case, the error-success sequence 
across three recall tests can be expressed as a function of recollective and non-recollective 
parameters, similar to the two equations above (see Appendix A for more details).  Because one 
single test provides two empirical probabilities, p(C) and p(E), three recall tests together would 
provide eight empirical probabilities [i.e., p(CCC), p(CCE), p(CEC), p(CEE), p(ECC), p(ECE), 
p(EEC), p(EEE)].  In this case, there would be enough degrees of freedom to secure identifiable 
estimates of the parameters.  
Three standard recall tests are the only prerequisite for using the dual-retrieval model, 
and one can administer either three study-test cycles or three consecutive test cycles following a 
single study phase.  Only slight modifications need to be made in the dual-retrieval model to 
19 
 
accommodate such variations in experiment designs.  To preview, I used three consecutive recall 
tests following a single study phase in all experiments of the present dissertation.  Thus, in the 
specific version of the dual-retrieval model that is used here (Chang, 2019), there is a forgetting 
parameter (F) added in the model, which indicates the probability of forgetting of verbatim 
traces after the first recall test (see Table 1.2).  The detailed mathematical machinery of the 
current version of the dual-retrieval model can be found in Appendix A.  Interested readers are 
also recommended to read Gomes et al. (2014) for a more in-depth explanation for various 
versions of the dual-retrieval model. 
Table 1.2 
Definitions for the Dual-Retrieval Model’s Parameters 
 
Parameters Definitions 
Direct access/recollection: The probability that the verbatim trace of an item’s 
D 
presentation can be directly accessed on a recall test 
Forgetting of direct access: The probability that direct access to the verbatim 
F  trace of an item’s presentation is available on the first recall test but not on the 
following recall tests due to forgetting 
Reconstruction: The probability that an item can be reconstructed on a recall test 
R 
when the verbatim trace of the item’s presentation cannot be directly accessed 
Familiarity judgment: The probability that a reconstructed item is judged to be 
J1, J2, J3 familiar enough to output.  J1, J2, J3 represent the familiarity judgment for the 
first, second, and third recall test, respectively 
 
Although the dual-retrieval model was developed based on FTT, it is often used 
independently from FTT since the validity of the model can always be established by its fits to 
the recall data.  The model fits were inspected using the goodness-of-fit tests, which were 
conducted by computing the maximum likelihood statistic (G2).  In the current experiments, 
there were eight empirical probabilities with six parameters to be estimated, and thus the model 
was fitted with one degree of freedom.  Because G2(1) is asymptotically distributed as χ2(1), the 
20 
 
goodness-of-fit is evaluated by comparing the observed G2(1) to the critical value of χ2(1) for 
rejecting the null hypothesis, which is 3.84 at the 0.05 confidence level.  Here, the null 
hypothesis is that the predicted and observed frequencies of the eight possible error-success 
sequences across three tests (i.e., CCC, CCE, CEC, CEE, ECC, ECE, EEC, EEE) are not 
significantly different from each other.  Thus, failure to reject to null hypothesis [G2(1) < 3.84] 
indicates that the model provides a statistically acceptable account for the data.  So far, the dual-
retrieval model has demonstrated excellent model fits with various types of recall data, such as 
free, associative, cued, and serial recall (Brainerd et al., 2002; Brainerd & Reyna, 2010).   
Examining model fits is always the first step in the statistical procedure of the dual-
retrieval model.  After the establishment of model fits, the next step is to compute maximum 
likelihood estimates of the model parameters.  Then, the final step is to test statistical differences 
in these parameters between different experimental conditions.  To test a null hypothesis that a 
parameter i is not significantly different between conditions A and B, one needs to first fit an 
unrestricted joint model to the combined data of conditions A and B.  The unrestricted joint 
model is created simply by combining two duplicated dual-retrieval models (one for condition A 
and the other for condition B).  After that, a restricted joint model is run by restricting the 
parameter i to be equal between conditions A and B.  Then, the difference in the maximum 
likelihood statistic G2 between the unrestricted and restricted model, ∆G2, is compared to the 
critical value of χ2(1), which is 3.84.  If ∆G2 > 3.84, the null hypothesis will be rejected, meaning 
that the parameter i is significantly different between conditions A and B.  The step-by-step 
guide for testing dual-retrieval model fits, estimating model parameters, and conducting 
condition-wise parameter significance tests can be found at 
https://www.human.cornell.edu/hd/research/labs/memorylab/research. 
21 
 
An Overview of the Present Experiments 
As reviewed above, recent studies have provided accumulating evidence for JOL 
reactivity and multiple theoretical explanations have been proposed.  However, given that 
systematic research on this topic is so recent, experimentation has not yet settled on a theoretical 
account for this phenomenon.  Therefore, the primary aim of the current study is testing the 
predictions of the two major theoretical accounts of JOL reactivity (the changed-goal hypothesis 
and the cue-strengthening hypothesis).  The second aim of the study is to specify the process-
level mechanism for JOL reactivity by identifying the specific retrieval processes that are 
responsible for the reactive effects of JOLs.  To achieve the first aim, I designed a series of 
hypothesis-driven experiments, of which the major designs and hypotheses are described below 
(A summary of the major theoretical predictions can be found in Table 1.3).  To achieve the 
second aim, I administered three consecutive recall tests in all experiments and fitted the dual-
retrieval model (Chang, 2019) to the recall data. 
In Experiment 1, I compared the reactive effects of item-level JOLs between strongly 
related, weakly related, and identical word pairs.  Identical and strongly related pairs are usually 
given higher JOLs than weakly related pairs, suggesting that they are perceived as easier to 
remember than the latter.  Under this circumstance, the changed-goal hypothesis predicts 
negative reactivity for weakly related pairs but positive reactivity for strongly related and 
identical pairs, because a switch from mastery-oriented to performance-oriented goals prompts 
participants to focus more on easier items at a cost of harder items.  However, the cue-
strengthening hypothesis predicts positive reactivity for all three types of pairs, as JOLs should 
strengthen both cue-target identity and cue-target relatedness in those pairs, which are both 
diagnostic cues (i.e., cues that are useful in the later memory tests).   
22 
 
In Experiment 2, I compared the recall performance for related and unrelated word pairs 
between a prestudy-JOL condition, an immediate-JOL condition, and a no-JOL condition.  Here, 
prestudy JOLs were made before the presentation of a word pair but with provided information 
about the pair type (i.e., related vs. unrelated), and immediate JOLs were conventional item-level 
JOLs that were made after studying each word pair.  The changed-goal hypothesis predicts 
comparable reactivity of prestudy and immediate JOLs because both types of JOLs increase 
participants’ awareness of the differences in learning difficulty between related and unrelated 
word pairs.  Thus, it predicts that both immediate and prestudy JOLs would produce negative 
reactivity for unrelated pairs but positive reactivity for related pairs.  However, the cue-
strengthening hypothesis predicts different patterns of reactivity between prestudy and 
immediate JOLs.  For related pairs, the cue-strengthening hypothesis predicts positive activity of 
immediate JOLs and either no or very weak positive reactivity of prestudy JOLs.  This is because 
most diagnostic cues are not available until a pair is studied, and thus these cues can only be 
strengthened by immediate JOLs but not by prestudy JOLs.  As for unrelated pairs, the cue-
strengthening hypothesis predicts little-to-no reactivity of both immediate and prestudy JOLs.  
Given that cue-target relation is a dominant diagnostic cue in making JOLs while there is no 
inherent semantic relation between cue and target in unrelated pairs, both types of JOLs are less 
likely to draw upon and strengthen diagnostic cues for unrelated pairs than for related pairs. 
In Experiment 3, I further tested the predictions of the cue-strengthening hypothesis.  
Here, I investigated how item-level JOLs and list-level JOLs (i.e., aggregate JOLs that ask 
people to judge how many words they can recall from a studied list) react to the target-target 
relatedness among word pairs given different criterion tests (associative versus free recall).  For 
target-target related pairs, the cue-strengthening hypothesis predicts positive reactivity of list-
23 
 
level JOLs in free recall but negative or no reactivity of list-level JOLs in associative recall.  This 
is because list-level JOLs should strengthen processing of inter-pair target-target relatedness, 
which is helpful in free recall but is either harmful or irrelevant to associative recall (Brainerd & 
Reyna, 2010; Schwenn & Underwood, 1968; Underwood et al., 1965).  Additionally, the cue-
strengthening hypothesis predicts either negative or no reactivity of item-level JOLs in free recall 
and little-to-no reactivity of item-level JOLs in associative recall.  Regarding the former, free 
recall for target-target related pairs relies heavily on inter-pair processing, but item-level JOLs 
primarily focus participants’ attention on within-pair cue-target relatedness and deflect their 
attention from inter-pair target-target relatedness.  Regarding the latter, associative recall relies 
heavily on cue-target relation.  However, since there is no inherent semantic relatedness between 
cue and target within all pairs, item-level JOLs are less likely to draw upon and strengthen cue-
target relation.  For target-target unrelated pairs, the cue-strengthening hypothesis predicts little-
to-no reactivity of both list-level and item-level JOLs in both free recall and associative recall.  
This is because there was neither cue-target nor target-target relatedness for JOLs to draw upon 
and strengthen, and thus JOLs should not produce significant enhancements in memory 
performance. 
In Experiment 4, I tested Senkova and Otani’s (2021) item-specific hypothesis, which is 
closely related to the cue-strengthening hypothesis as it assumes that JOLs improve memory by 
enhancing item-specific cues embedded in the study materials.  I tested this hypothesis by 
factorially manipulating list organization of categorical lists (blocked, randomized) versus JOL 
solicitation (item-JOL, no-JOL).  The item-specific hypothesis predicts positive JOL reactivity 
for both blocked and randomized categorical lists in free recall because item-level JOLs should 
enhance item-specific processing, which complements the relational processing naturally evoked 
24 
 
by both types of categorical lists.  Moreover, it predicts that item-level JOLs would primarily 
affect the direct access (D) parameters or the forgetting (F) parameters in the dual-retrieval 
model, which indexes recollection and forgetting of item-specific verbatim details, respectively.  
Table 1.3 
A Summary of the Major Theoretical Predictions for Recall Performance in Experiments 1-4 
Exps Theoretical Predictions for Recall Performance 
1 Changed-goal hypothesis:        Cue-strengthening hypothesis: 
- Identical pair: Item-JOL > No-JOL - Identical pair: Item-JOL > No-JOL 
- Strong pair: Item-JOL > No-JOL - Strong pair: Item-JOL > No-JOL 
- Weak pair: Item-JOL < No-JOL - Weak pair: Item-JOL > No-JOL 
 
2 Changed-goal hypothesis:        Cue-strengthening hypothesis: 
- Related pairs: Immediate-JOL = - Related pairs: Immediate-JOL > 
Prestudy-JOL > No-JOL Prestudy-JOL ≥ No-JOL 
- Unrelated pairs: Immediate-JOL = - Unrelated pairs: Immediate-JOL = 
Prestudy-JOL < No-JOL Prestudy-JOL = No-JOL 
 
3 Cue-strengthening hypothesis: 
- Target-target related pairs: 
- Free recall: List-JOL > No-JOL, Item-JOL ≤ No-JOL 
- Associative recall: List-JOL ≤ No-JOL, Item-JOL = No-JOL 
- Target-target unrelated pairs: 
- Free recall: List-JOL = No-JOL, Item-JOL = No-JOL 
- Associative recall: List-JOL = No-JOL, Item-JOL = No-JOL 
 
4 Item-specific hypothesis: 
- Blocked categorical lists: Item-JOL > No-JOL 
- Randomized categorical lists: Item-JOL > No-JOL 
- Item-level JOLs primarily affect the D or F parameters 
Note.  Exps = Experiments.  Item-JOL, immediate-JOL, prestudy-JOL, list-JOL, and no-JOL 
all refer to the corresponding JOL conditions.  D = direct access parameter.  F = forgetting 
parameter.  Additionally, “=” means statistically equivalent recall (i.e., little-to- no reactivity), 
“>” means significantly better recall (i.e., positive reactivity), “≥” means significantly better or 
statistically equivalent recall (i.e., positive or no reactivity ), “<” means significantly worse 
recall (i.e., negative reactivity), and “≤” means significantly worse or statistically equivalent 
recall (i.e., negative or no reactivity). 
  
25 
 
CHAPTER 2  
EXPERIMENT 1 
In Experiment 1, I used three different types of cue-target word pairs (weakly related, 
strongly related, and identical).  In this scenario, the changed-goal hypothesis and the cue-
strengthening hypothesis have different predictions about JOL reactivity.  It has been established 
in prior JOL studies that participants rank the memorability of the three types of pairs as identical 
pairs > strongly related pairs > weakly related pairs (Castel et al., 2007; Ikeda et al., 2016).  
Thus, the changed-goal hypothesis predicts that JOLs prompt participants to focus more on the 
least and moderately difficult items (identical and strongly related pairs) and less on the most 
difficult items (weakly related pairs), relative to when JOLs are not solicited.  As a result, JOLs 
would produce positive reactivity for identical and strongly related pairs, but negative reactivity 
for weakly related pairs.   
However, the cue-strengthening hypothesis predicts positive JOL reactivity for all three 
types of word pairs.  This is because targets of strongly related pairs, weakly related pairs, and 
identical pairs are both better recalled than those of unrelated pairs (Castel et al., 2007), 
indicating that cue-target semantic relation (in both strongly and weakly related pairs) and cue-
target identity (in identical pairs) are both diagnostic cues in subsequent associative recall tests.  
Therefore, because associative recall tests are sensitive to pair relation and word identity, the 
solicitation of JOLs, which presumably enhances these cues, should improve memory 
performance for all three types of pairs. 
Here, I pitted the two hypotheses against each other by examining whether there are 
differences in JOL reactivity between strongly related, weakly related, and identical pairs.  
Specifically, I used a 3 (Pair type: weakly related, strongly related, identical)  2 (JOL condition: 
26 
 
item-JOL, no-JOL) mixed design, with pair type manipulated within subjects and JOL conditions 
manipulated between subjects. 
Method 
Participants 
Participants were 88 Cornell undergraduates (Mage = 20.31, SDage = 2.24) who 
participated for extra course credits.  Forty-one participants were randomly assigned to the item-
JOL condition, and 47 participants were randomly assigned to the no-JOL condition.  The 
sample size per condition was comparable to that of Castel et al. (2007), in which the JOL 
pattern of identical pairs > strongly related pairs > weakly related pairs and the recall pattern of 
strongly related pairs > weakly related pairs = identical pairs > unrelated pairs were established.   
Materials 
The experiment was programmed and administered via Qualtrics.  The materials were 72 
cue-target word pairs that were constructed based on the Nelson free association norms (D. L. 
Nelson et al., 2004).  All word pairs used in Experiment 1 can be found in Appendix B.  Among 
the 72 word pairs, twenty-four were pairs with strong forward associative strength (Mforward = .53, 
SDforward = .16; e.g., spoon - fork), 24 were pairs with weak forward associative strength (Mforward 
= .02, SDforward = .01, e.g., beard - trim), and 24 were identical pairs (e.g., ladder – ladder).  The 
three types of word pairs were controlled for concreteness, word frequency, and word length for 
both cues and targets.   
Procedure 
Participants were randomly assigned to either an item-JOL condition or a no-JOL 
condition.  All participants completed two blocks, and the order of the blocks was 
counterbalanced across participants.  In each block, there were a study phase and a test phase.  In 
27 
 
the study phase, participants studied 36 word pairs, including 12 weakly related pairs, 12 
strongly related pairs, and 12 identical pairs.  In the test phase, they completed three associative 
recall tests for the word pairs they just studied.  The item-JOL and no-JOL conditions differed 
only in the study phase.   
In the study phase of both the item-JOL and no-JOL conditions, each word pair was 
presented for 10 seconds.  In the item-JOL condition, participants were informed in advance that 
for each word pair they studied, they were required to rate how likely they could recall the word 
on the right-hand side of the pair when provided with the word on the left-hand side on a later 
memory test (from 0 -100, with 0 = not likely at all and 100 = totally likely).  Participants were 
also told that they should fine-tune their judgments by using the whole 100-point percentage 
scale.  After each pair was presented for 4 seconds, a JOL prompt (“Likelihood to recall?”) 
appeared beneath the word pair.  Participants were given a maximum of 6 seconds to type their 
JOLs into a blank box under the JOL prompt.  When 6 seconds were up, the screen cleared and 
the program automatically proceeded to the next word pair.  In the no-JOL condition, the only 
difference from the JOL condition was that all word pairs were presented for 10 seconds without 
JOL prompts.   
In the test phase, participants completed three consecutive associative recall tests, with 
each test preceded by a 1-min buffer task of simple math problem solving.  Participants were not 
given additional study trials before the second or the third tests.  Before each recall test, 
participants were reminded that spelling does not count and that they did not need to worry about 
spelling.  There were in total 36 associative recall test trials, corresponding to the 36 studied 
word pairs.  In each associative recall test trial, participants were provided with the cue word of a 
word pair, and they were given a maximum of 10 seconds to type the target word that was paired 
28 
 
with the cue word during the study phase.  They were allowed to advance after 2 seconds, but 
they were instructed to do so only when they finished typing the target word or when they were 
certain that they could not recall the target word.  Otherwise, the program would automatically 
advance to the next cue when 10s were up.  The order of the test trials was randomized for each 
participant.   
Results 
ANOVA Results for JOLs  
A one-way repeated measures ANOVA was conducted to compare the effects of pair 
type (weakly related, strongly related, identical) on JOLs.  The ANOVA showed that there was a 
main effect of pair type, F(2, 80) = 50.11, MSE = 72.18, η 2p  = .56, p < .001.  This main effect 
was driven by the fact that identical pairs (M = 72.2, SD = 17.8) and strongly related pairs (M = 
69.1, SD = 15.4) both received higher JOLs than weakly related pairs (M = 54.6, SD = 15.7), 
while there was no significant difference between JOLs for the former two pair types.   
ANOVA Results for Recall 
A 3 (Pair type: weakly related, strongly related, identical)  2 (JOL condition: item-JOL, 
no-JOL)  3 (Test: 1, 2, 3) mixed ANOVA was conducted for recall.  The ANOVA results 
revealed both a main effect of pair type, F(2, 172) = 139.10, MSE = .04, η 2p  = .62, p < .001, and 
a main effect of JOL condition, F(1, 86) = 14.92, MSE = .43, η 2p  = .15, p < .001.  Post-hoc tests1 
revealed that participants recalled more identical pairs (M = .79, SD = .26) and strongly related 
pairs (M = .76, SD = .23) than weakly related pairs (M = .53, SD = .26), with no significant 
difference in recall between identical and strongly related pairs.  Meanwhile, as can be seen in 
Figure 2.1, the main effect of JOL condition was driven by the fact that recall was better in the 
 
1 Unless otherwise specified, the post hoc tests referred to Tukey’s test throughout the present dissertation. 
29 
 
item-JOL condition (M = .79, SD = .23) than in the no-JOL condition (M = .61, SD = .29).  
Noticeably, the JOL condition  Pair type interaction was not significant, suggesting that JOLs 
improved recall to a comparable extent for all three types of word pairs.  
 
 
 
Figure 2.1.  Associative recall for identical, strongly related, and weakly related word pairs 
across item-JOL and no-JOL conditions in Experiment 1.  Panel A = recall test 1.  Panel B = 
recall test 2.  Panel C = recall test 3.  Panel D = average recall across all three tests.  Error bars 
are based on SEs. 
Model Results 
The associative recall data were further analyzed with the dual-retrieval model (Chang, 
2019).  As can be seen in Table 2.1, the model delivered excellent fits across all six possible 
30 
 
combinations of pair type (strongly related, weakly related, identical) and JOL condition (item-
JOL, no-JOL).  The average G2(1) is .08, which is far below the critical value of 3.84.   
Table 2.1  
Dual-Retrieval Model Fits and Parameter Estimates for Experiment 1 
Pair type JOL condition G2 D F J1 J2 J3 R 
Strong         
 Item-JOL .00 .80 .01 .63 .84 .69 .39 
 No-JOL .04 .59 .03 .62 .73 .68 .37 
Weak         
 Item-JOL .26 .57 .00 .62 .75 .94 .17 
 No-JOL .06 .39 .03 .58 .77 .78 .15 
Identical         
 Item-JOL .02 .79 .01 .61 .74 .76 .67 
 No-JOL .08 .61 .05 .55 .67 .68 .45 
Note. Strong = strongly related pairs; weak = weakly related pairs; identical = identical pairs. 
D = direct access parameter; F = forgetting parameter; J1 = familiarity judgment parameter for 
test 1; J2 = familiarity judgment parameter for test 2; J3 = familiarity judgment parameter for 
test 3; R = reconstruction parameter.  Parameters that differed significantly between item- and 
no-JOL conditions are printed in boldface.  
 
Next, to determine which underlying processes were responsible for JOL reactivity, I 
compared the retrieval parameters between item-JOL and no-JOL conditions for each pair type.  
The condition-wise parameter test revealed that the D parameter was consistently higher in the 
item-JOL condition compared to the no-JOL condition across the strongly related (.80 vs. .59), 
weakly related (.57 vs. .39), and identical pairs (.79 vs. .61), ∆G2s > 15.06, ps < .001.  This 
suggests that item-level JOLs enhanced direct access to item-specific verbatim details for all pair 
types.  On the contrary, the F parameter was consistently lower in the item-JOL condition 
compared to the no-JOL condition across the strongly related (.01 vs. .03), weakly related (.00 
vs. .03), and identical pairs (.01 vs. .05), ∆G2s > 7.64, ps < .006, suggesting that JOLs provided a 
buffer against forgetting of verbatim traces for all pair types.   
31 
 
Meanwhile, for weakly related pairs only, the J3 parameter was higher in the item-JOL 
condition than in the no-JOL condition (.94 vs. .78), ∆G2 = 6.22, p = .013.  Thus, after making 
item-level JOLs, participants were more likely to output the reconstructed target words of weakly 
related pairs on the last recall test because those target words felt more familiar to them.  Last, 
the R parameter was higher in the item-JOL condition relative to the no-JOL condition (.67 
vs. .45) for identical pairs, ∆G2 = 23.64, p < .001.  This indicates that JOLs enhanced 
participants’ abilities to reconstruct the target words of identical pairs when they did not have 
direct access to verbatim details of those words. 
Discussion 
Experiment 1 showed that item-level JOLs produced positive reactivity for all three types 
of word pairs (strongly related, weakly related, and identical), and the effects of JOL condition 
did not interact with the effects of pair type.  Recall that the changed-goal hypothesis predicts 
that item-level JOLs would produce differential reactivity for the three types of pairs: positive 
reactivity for identical and strongly related pairs but negative reactivity for weakly related pairs.  
This is because the former two types of word pairs are perceived as easier to learn compared to 
the latter.  Thus, they should be prioritized when JOLs prompt participants to switch their 
learning goals from mastery-oriented to performance-oriented.  Rather, the cue-strengthening 
hypothesis predicts that item-level JOLs should produce positive reactivity for all three types of 
pairs, as both cue-target relatedness and cue-target identity are diagnostic cues in subsequent 
memory tests.  Clearly, my results provide support for the cue-strengthening hypothesis rather 
than the changed-goal hypothesis.   
It is noteworthy that the current results provide a rather strong counterevidence against 
the changed-goal hypothesis.  Mitchum et al. (2016) and Janes et al. (2018) suggested that even 
32 
 
in the absence of significant negative reactivity, a larger discrepancy in memory performance 
between easier and harder items may reflect a shift from mastery-oriented to performance-
oriented goals, which is in line with the changed-goal hypothesis.  However, my results showed 
that there was no interaction between JOL condition and pair type, indicating that the recall 
difference between easier (strongly related and identical pairs) and harder items (weakly related 
pairs) remained invariant between the item-JOL and no-JOL conditions.  Additionally, there was 
no negative reactivity observed even at the trend level, but there was reliable and consistent 
positive reactivity across all three types of word pairs.  Thus, it seems difficult for the changed-
goal hypothesis to accommodate the current results without additional assumptions. 
For the first time, I was able to identify which underlying retrieval processes are altered 
by the solicitation of JOLs via the implementation of the dual-retrieval model.  As shown in 
Table 2.1, the D parameter was consistently higher in the item-JOL condition than in the no-JOL 
condition across all pair types, and the F parameter was consistently lower.  This suggests that 
making item-level JOLs helped participants retain direct access to the verbatim details of studied 
word pairs and reduced their susceptibility to forgetting.  In addition, it can be seen that there 
was an additional enhancement in familiarity judgment (J3) for weakly related pairs and 
reconstruction (R) for identical pairs.  Thus, although the reactive effects of JOLs were 
comparable for all three pair types at the behavior level, the underlying process-level 
mechanisms were slightly different.  This was expected based on the cue-strengthening 
hypothesis because different cues might be strengthened during the process of making JOLs for 
different word pairs (e.g., the strength of pair association, the identity of words), and thus the 
underlying memory processes could be affected in different ways.   
33 
 
It may be noted that previous studies showed that JOLs follow a pattern of identical 
pairs > strongly related pairs > weakly related pairs (Castel et al., 2007; Ikeda et al., 2016).  This 
established that the mechanism predicted by the changed-goal hypothesis should have operated 
in the current experiment given that certain pairs were perceived to be easier to remember than 
others.  The current results showed a slightly different JOL pattern of identical pairs = strongly 
related pairs > weakly related pairs.  One possible reason for the minor difference may be that 
unlike the previous studies (Castel et al., 2007; Ikeda et al., 2016), I did not include unrelated 
word pairs in the current study design.  The elimination of unrelated pairs may increase the 
granularity of the comparative processes between different word types.  That is, with unrelated 
pairs being taken out of the picture, participants’ focus of comparison may be switched from a 
categorical judgment of presence versus absence of relatedness (related versus unrelated pairs) to 
a more fine-grained judgment about the strength of relation (strongly related versus weakly 
related pairs).  Thus, the differences between strongly related and weakly related pairs may 
appear more salient, driving up the perceived memorability for strongly related pairs.  
Nevertheless, the slight differences in the JOL pattern should not undermine my goal to test the 
prediction of the changed-goal hypothesis.  Because there were significant large differences in 
JOLs between identical and weakly related pairs and between strongly and weakly related pairs, 
it was still validated that participants could distinguish the learning difficulty levels between 
these pair types.  That is, participants perceived the identical and strongly related pairs as easier 
to remember than weakly related pairs, which secures the foundation for the changed-goal 
hypothesis’s prediction of differential reactivity for identical and strongly related pairs versus for 
weakly related pairs. 
34 
 
It may also be noted that although Soderstrom et al. (2015) reported no reactivity for 
weakly related pairs, the current experiment demonstrated positive reactivity for weakly related 
pairs.  Nevertheless, the current findings were not without precedent, as Tauber and Witherby 
(2019; Experiments 3, 4, & 5) also found positive JOL reactivity for weakly related pairs with 
young adults.  It is worth mentioning that the weakly related pairs used in the current experiment 
and in Tauber and Witherby (2019) were both constructed based on the Nelson free association 
norms (D. L. Nelson et al., 2004), whereas those used in Soderstrom et al. (2015) were 
constructed based on the MRC psycholinguistic database (Coltheart, 1981).  Thus, just as Tauber 
and Witherby noted, the discrepant findings may be attributed to the differences in stimuli, such 
as differences in the levels of cue-target relatedness in weakly related pairs. 
  
35 
 
CHAPTER 3  
EXPERIMENT 2 
In Experiment 2, I tested both the changed-goal hypothesis’s and the cue-strengthening 
hypothesis’s predictions about reactivity of prestudy JOLs.  Prestudy JOLs were first developed 
by Castel (2008), which were prompted with provided information about the to-be-studied item 
but were made before the actual presentation of the item.  For instance, before each word pair 
was presented, Mueller et al. (2013; Experiment 1) told participants whether they were about to 
study a related or an unrelated word pair and requested them to rate how likely they would recall 
the word pair in a later memory test.  They found that similar to immediate JOLs (i.e., the 
conventional item-level JOLs that are made immediately after each word pair is presented), 
prestudy JOLs were substantially higher for related pairs than for unrelated pairs.  This suggests 
that participants could differentiate learning difficulty between these two types of word pairs 
based on the provided prompts, even before the specific pairs were studied.   
According to the changed-goal hypothesis, immediate JOLs produce positive reactivity 
for related word pairs but negative reactivity for unrelated word pairs because the solicitation of 
JOLs enhances participants’ awareness that related pairs were more likely to be remembered 
than unrelated ones.  As a result, participants de-emphasize their goal of mastering all the pairs 
and instead focus more on learning relatively easier (related) pairs at the expense of learning 
relatively harder (unrelated) pairs (Mitchum et al., 2016).  Following this logic, the changed-goal 
hypothesis predicts similar reactive effects of prestudy JOLs as of immediate JOLs, because 
prestudy JOLs can also function as a reminder of the differences in learning difficulty between 
related and unrelated pairs.  Thus, prestudy JOLs should similarly switch participants’ goals 
36 
 
from mastery-oriented to performance-oriented, therefore improving memory for related pairs 
but impairing memory for unrelated pairs.   
The cue-strengthening hypothesis, on the other hand, predicts a different pattern of JOL 
reactivity.  Recall that Koriat’s (1997) cue-utilization framework posits that immediate JOLs are 
based on three general classes of cues: intrinsic, extrinsic, and mnemonic.  To recap, intrinsic 
cues are inherent characteristics of the study items, extrinsic cues are factors that are relevant to 
encoding conditions or operations, and mnemonic cues are factors that pertain to participants’ 
subjective learning experience of the items.  Because prestudy JOLs are made before actually 
learning the items, they cannot be possibly based on mnemonic cues.  However, with 
participants’ knowledge about item type (e.g., related versus unrelated pairs) and encoding 
condition (e.g., single study opportunity), prestudy JOLs can still be based on intrinsic and 
extrinsic cues (Undorf & Bröder, 2020).    
Nevertheless, only a small portion of intrinsic cues are accessible when making prestudy 
JOLs, because the prestudy JOL prompts provide very limited information regarding the intrinsic 
properties of the to-be-remembered pairs.  To illustrate, subjects receive the same prompt of 
“you are about to study a related word pair” for all related pairs, without any information that is 
specific to an individual pair.  With such a standardized prompt, participants merely know that 
there exists semantic relation between the cue and target, but they still know nothing about in 
what ways the cue and target are related (e.g., categorical relation, synonym, antonym, etc.) or 
how strongly the cue and target are related (e.g., relatively strong or weak).  Note that associative 
recall test requires participants to remember the cue-target pairing that is specific to each pair.  
Thus, it is unclear how a uniform prompt for all related pairs can help participants output the 
specific target word that is paired with a given cue.   
37 
 
Similarly, only partial extrinsic cues are processed when making prestudy JOLs.  Since 
participants may adopt different strategies for different pairs, the encoding strategies participants 
adopt for a specific pair might not be accessible until encoding it.  Furthermore, it is worth 
mentioning that JOLs were less sensitive to extrinsic cues than to intrinsic and mnemonic cues 
(Koriat, 1997).  Thus, it would be unclear whether the partial extrinsic cues can be strengthened 
by JOLs and picked up by the subsequent memory tests.  In summary, because substantially 
fewer cues are processed in making prestudy JOLs than making immediate JOLs, and associative 
recall tests are not necessarily sensitive to the partial intrinsic and extrinsic cues that are 
processed in making prestudy JOLs, the cue-strengthening hypothesis predicts either a much 
weaker reactivity of prestudy JOLs relative to immediate JOLs or no reactivity at all.  
To test the two predictions discussed above, I used a 2 (Pair type: related, unrelated)  3 
(JOL conditions: prestudy-JOL, immediate-JOL, no-JOL) mixed design, with pair type 
manipulated within subjects and JOL condition manipulated between subjects.  The changed-
goal hypothesis predicts similar reactivity between prestudy and immediate JOLs, namely 
positive reactivity for related pairs but negative reactivity for unrelated pairs.  However, the cue-
strengthening hypothesis predicts different patterns of reactivity between prestudy and 
immediate JOLs:  Immediate JOLs should produce positive reactivity for related pairs, while 
prestudy JOLs should produce either no reactivity or weaker positive reactivity for related pairs 
compared to immediate JOLs.  Both prestudy and immediate JOLs should produce little-to-no 
reactivity for unrelated pairs. 
Method 
Participants 
38 
 
Participants were 119 young adults (Mage = 23.13, SDage = 4.64) recruited from Prolific, 
an online experiment platform (Palan & Schitter, 2018).  They were all fluent English speakers 
who were located in the United States, Canada, or the United Kingdom.  Each participant was 
compensated $5.85 for participation.  Forty-two participants were randomly assigned to the 
immediate-JOL condition, 34 participants were randomly assigned to the prestudy-JOL 
condition, and 41 participants were randomly assigned to the no-JOL condition.  To ensure data 
quality, an attention check operation was implemented for each study trial in the no-JOL 
condition (see more details in the Procedure section below).  Following Myers et al. (2020), I 
adopted the criterion that participants who failed to provide JOLs (in the immediate- and 
prestudy-JOL conditions) or complete attention check questions (in the no-JOL condition) for at 
least 80% of the study trials would be removed.  Fortunately, no participant missed more than 
21% of the JOLs or attention check responses, suggesting that participants were complying with 
my instructions.  However, six participants were removed from analyses because they indicated 
in the post-experiment survey that they had taken notes during the study phase, despite that I had 
provided explicit warning at the beginning of the experiment that they should not take notes 
during study at the risk of missing subsequent words or attention check questions.  This leaves a 
final sample of 40 participants in the immediate-JOL condition, 34 participants in the prestudy-
JOL condition, and 39 participants in the no-JOL condition.  The sample size of all conditions 
was larger than that of Mueller et al. (2013), which established robust differences in prestudy 
JOLs between related and unrelated pairs. 
Materials 
The experiment was programmed and administered via Qualtrics.  The materials were 80 
cue-target word pairs that were constructed based on the Nelson free association norms (D. L. 
39 
 
Nelson et al., 2004).  Half of the 80 pairs were related word pairs (Mforward = .45, SDforward = .13; 
e.g., shore - beach), and the other half were unrelated word pairs (e.g., brush - coffee).  I 
carefully matched concreteness, word frequency, and word length for both cues and targets 
between the two types of word pairs.  All word pairs used in Experiment 2 can be found in 
Appendix C.   
Procedure 
Participants were randomly assigned to either the immediate-JOL condition, the 
prestudy-JOL condition, or the no-JOL condition.  All participants completed two blocks, and 
the order of blocks was counterbalanced across participants.  In each block, there were a study 
phase and a test phase.  In the study phase, participants studied 40 word pairs, including 20 
related pairs and 20 unrelated pairs.  Each pair was presented for 4 seconds.  In the immediate-
JOL condition, after each pair was presented for 4 seconds, the word pair disappeared and a JOL 
prompt (“Likelihood to recall?”) appeared.  Here, participants were given a maximum of 5 
seconds to rate how likely they can recall the word on the right-hand side of the pair when 
provided with the word on the left-hand side on a later memory test (from 0 -100, with 0 = not 
likely at all and 100 = totally likely).  They were also told to fine-tune their judgments by using 
the entire 100-point percentage scale.  When 5 seconds were up, the screen cleared and the 
program automatically proceeded to the next word pair.   
In the prestudy-JOL condition, before each word pair was presented, participants saw a 
statement informing them of the type of the upcoming pair (“You are about to study a 
related/unrelated word pair”), along with a JOL prompt (“Likelihood to recall?”) beneath it.  
They were given five seconds to rate a JOL for the upcoming word pair on a 0-100 scale similar 
to in the immediate-JOL condition.  When five seconds were up, the screen cleared and the given 
40 
 
word pair was presented for 4 seconds.  In the no-JOL condition, the only difference from the 
immediate-JOL condition was that after each word pair was presented for 4 seconds, participants 
were not required to make any JOL ratings.  Instead, participants saw a screen with two blank 
boxes and they were instructed to check both boxes within 5 seconds.  This operation was 
adapted from Bowen et al. (2020), which was meant to discourage participants from writing 
down the words and to check whether they were paying attention throughout the study phase.   
In the test phase, participants completed three consecutive associative recall tests, with 
each test preceded by a 1-min buffer task of math problem solving.  The procedure in the test 
phase was the same as in Experiment 1.  After completing both blocks, participants were 
required to complete a very brief post-experiment survey, which asked whether they had taken 
notes during the experiment and whether they had any feedback or concerns regarding the 
current experiment. 
Results 
ANOVA Results for JOLs 
A 2 (Pair type: related, unrelated)  2 (JOL condition: prestudy-JOL, immediate-JOL) 
mixed ANOVA revealed a main effect of pair type, F(1, 72) = 240.87, MSE = 180.48, η 2p  = .77, 
p < .001, as related word pairs (M = 67.60, SD = 18.99) elicited higher JOLs than unrelated word 
pairs (M = 32.43, SD = 18.91).  There was also a main effect of JOL condition, F(1, 72) = 4.87, 
MSE = 469.36, η 2p  = .06, p = .03, as immediate JOLs (M = 53.64, SD = 27.31) were overall 
higher than prestudy JOLs (M = 45.76, SD = 23.49).  Last, there was a Pair type  JOL condition 
interaction, F(1, 72) = 18.55, MSE = 180.48, η 2p  = .20, p < .001.  As can be seen in Figure 3.1, 
the interaction was driven by the fact that immediate JOLs were higher than prestudy JOL for 
related pairs (Ms = 75.61 vs. 58.18), but not for unrelated pairs (Ms = 31.67 vs. 33.33).   
41 
 
 
Figure 3.1. JOLs for related and unrelated pairs across immediate- and prestudy-JOL conditions 
in Experiment 2.  Error bars are based on SEs. 
ANOVA Results for Recall 
A 2 (Pair type: related, unrelated)  3 (JOL condition: prestudy-JOL, immediate-JOL, no-
JOL)  3 (Test: 1, 2, 3) mixed ANOVA revealed a main effect of JOL condition, F(2, 110) = 
7.48, MSE = .15, η 2p  = .12, p = .001, a main effect of pair type, F(1, 110) = 698.70, MSE = .05, 
η 2p  = .86, p < .001, and a Pair type  Test interaction, F(2, 220) = 10.36, MSE = .0007, η 2p  = .09, 
p < .001.  Post hoc tests suggested that recall in the immediate-JOL condition (M = .64, SD 
= .29) was overall better than in the no-JOL condition (M = .55, SD = .28) and in the prestudy-
JOL condition (M = .50, SD = .30), while there was no significant difference between the latter 
two JOL conditions.  Not surprisingly, recall was also better for related pairs (M = .79, SD = .15) 
than for unrelated pairs (M = .35, SD = .23).  Meanwhile, the effect of pair type was not reliably 
modified by the test, as related pairs were always recalled better across three test cycles (see 
Figure 3.2). 
42 
 
Although the JOL condition  Pair type interaction did not reach the conventional level 
of statistical significance, I still conducted further analyses to compare recall performance 
between the three JOL conditions separately for related and unrelated word pairs, because this is 
critical for testing the predictions of the changed-goal hypothesis and the cue-strengthening 
hypothesis.  Specifically, I conducted two additional 3 (JOL conditions: prestudy-JOL, 
immediate-JOL, no-JOL)  3 (Test: 1, 2, 3) mixed ANOVAs, one for related word pairs and the 
other for unrelated word pairs.   
For related word pairs, the ANOVA revealed a main effect of JOL condition, F(2, 110) = 
10.04, MSE = .05, η 2p  = .15, p < .001.  As shown in Figure 3.2, recall performance was better in 
the immediate-JOL condition (M = .86, SD = .11) than in the no-JOL condition (M = .76, SD 
= .15) and in the prestudy-JOL condition (M = .74, SD = .16), ps ≤ .001, while there was no 
difference in recall between the latter two JOL conditions.  This suggests that immediate JOLs 
produced positive reactivity for related word pairs, whereas prestudy JOLs did not.   
As for unrelated word pairs, a main effect of JOL condition was found, F(2, 110) = 4.54, 
MSE = .15, η 2p  = .08, p = .013.  This main effect was driven by the fact that recall was lower in 
the prestudy-JOL condition (M = .26, SD = .19) than in the immediate-JOL condition (M = .42, 
SD = .24), p = .009.  Although a visual inspection of Figure 3.2 seems to suggest numerical 
differences in recall between the immediate- and the no-JOL condition (M = .35, SD = .23) and 
between the prestudy- and no-JOL conditions, post-hoc tests suggested that both differences did 
not reach statistical significance.  Therefore, immediate and prestudy JOLs both had no reactive 
effects on recall for unrelated word pairs.   
43 
 
 
Figure 3.2. Associative recall for related and unrelated pairs across immediate-, prestudy-, and 
no-JOL conditions in Experiment 2.  Panel A = recall test 1.  Panel B = recall test 2.  Panel C = 
recall test 3.  Panel D = average recall across all three tests.  Error bars are based on SEs. 
Model Results 
The associative recall data were fit to the same dual-retrieval model as in Experiment 1 
(Chang, 2019).  As can be seen in Table 3.1, the model delivered excellent fits to the recall data 
across all six possible combinations between JOL condition (immediate-, prestudy-, and no-JOL) 
and pair type (related, unrelated), with an average G2(1) of 1.44.  For related word pairs, the D 
parameter was higher in the immediate-JOL condition than in the prestudy-JOL condition and in 
the no-JOL condition (.81 vs. .68 vs. .68), ∆G2s > 9.84, ps < .002.  This suggests that immediate 
JOLs enhanced participants’ direct access to verbatim details whereas prestudy JOLs did not.  
Meanwhile, the F parameter was significantly higher in the prestudy-JOL condition (.08) than 
44 
 
the other two JOL conditions (.04 and .05), ∆G2s > 4.74, ps < .029, indicating that prestudy JOLs 
induced more forgetting of verbatim traces.   
Table 3.1 
Dual-Retrieval Model Fits and Parameter Estimates for Experiment 2 
Pair type JOL condition G2 D F J1 J2 J3 R 
Related         
 Immediate-JOL .24 .81 .04 .67 .86 .87 .47 
 Prestudy-JOL 2.20 .68 .08 .59 .75 .75 .42 
 No-JOL 2.76 .68 .05 .57 .79 .86 .43 
Unrelated         
 Immediate-JOL 1.34 .35 .03 .60 .81 .81 .16 
 Prestudy-JOL .19 .24 .13 .40 .66 .77 .10 
 No-JOL 1.92 .31 .03 .40 .72 .76 .11 
Note. D = direct access parameter; F = forgetting parameter; J1 = familiarity 
judgment parameter for test 1; J2 = familiarity judgment parameter for test 2; J3 = 
familiarity judgment parameter for test 3; R = reconstruction parameter.  
Parameters that differed significantly between immediate-, prestudy-, and no- 
conditions are printed in boldface. 
 
Regarding the unrelated pairs, the D parameter was comparable between the immediate- 
and no-JOL condition (.35 vs. .31), but it was higher in those two JOL conditions relative to the 
prestudy-JOL condition (.24), ∆G2s > 12.38, ps < .001.  This suggests that immediate JOLs did 
not improve direct access to verbatim details, while prestudy JOLs undermined it.  Meanwhile, 
the F parameter was again higher in the prestudy-JOL condition than in the other two JOL 
conditions (.13 vs. .03 vs. .03), ∆G2s > 23.09, ps < .001, suggesting that making prestudy JOLs 
made unrelated pairs more forgettable.  Additionally, unrelated word pairs felt more familiar 
when participants had made immediate JOLs after encoding them, as the J1 parameter was higher 
in the immediate-JOL condition compared to the other two JOL conditions (.60 vs. .40 vs. .40), 
∆G2s > 3.89, ps < .049.  Last, immediate JOLs also elevated the R parameter compared to the 
prestudy JOL condition (.16 vs. .11), ∆G2 = 6.76, p = .009, indicating that participants found it 
45 
 
easier to reconstruct the target word of a pair base on partial information when they had made a 
JOL for the pair after than before encoding the pair. 
Discussion 
Previously, prestudy JOLs were mostly used to isolate the contribution of metacognitive 
beliefs to JOLs.  Thus, most prior studies that administered prestudy JOLs were interested in 
comparing prestudy and immediate JOLs.  In that sort of studies, some researchers compared 
memory performance between the two JOL conditions and reported that recall in the immediate-
JOL condition was better than in the prestudy-JOL condition (Mueller et al., 2013, 2016; Undorf 
& Bröder, 2020).  Still, other studies found no difference in memory performance between the 
two JOL conditions (Price & Harrison, 2017; Witherby & Tauber, 2017a).  My results aligned 
with the former.  Notably, the current study was the first to directly compare a prestudy-JOL 
condition to a no-JOL control condition so as to examine whether prestudy JOLs also produce 
reactivity like immediate JOLs.  With the no-JOL condition serving as a baseline, I compared 
reactivity between prestudy JOLs and immediate JOLs, which provided an attractive testbed for 
both the cue-strengthening hypothesis and the changed-goal hypothesis.   
My results showed that prestudy JOLs produced no reactivity in associative recall for 
either related or unrelated word pairs.  This is consistent with the cue-strengthening hypothesis, 
as it predicts that prestudy JOL should have no effects on subsequent recall or weaker effects 
compared to immediate JOLs because prestudy JOLs were made based on fewer and less 
diagnostic cues than immediate JOLs.  On the other hand, the current result is contrary to what 
the changed-goal hypothesis predicts.  Even though prestudy JOLs were significantly higher for 
related than for unrelated pairs, suggesting that participants were aware of the differences in 
learning difficulty between these two types of word pairs, prestudy JOLs produced neither 
46 
 
positive reactivity for related pairs nor negative reactivity for unrelated pairs.  Thus, the absence 
of reactive effects of prestudy JOLs provided no evidence that participants had changed their 
learning goal by focusing more on studying related pairs at the cost of unrelated pairs.  
Meanwhile, it was also found that immediate JOLs produced positive reactivity for related word 
pairs but no reactivity for unrelated word pairs, which is again more consistent with the cue-
strengthening hypothesis than the changed-goal hypothesis.   
Next, the model results shed light on the reactive effects of JOLs at the retrieval process 
level.  As shown in Table 3.1, immediate JOLs enhanced recollection of verbatim details for 
related pairs but not for unrelated pairs, which echoes the ANOVA results at the behavioral 
level.  This result suggests that immediate JOLs only enhanced recollection when the cues 
embedded in the word pairs were useful in subsequent associative recall tests (i.e., cue-target 
relatedness in the current scenario), which is again consistent with the cue-strengthening 
hypothesis.  It is noteworthy that for both related pairs in Experiment 2 and strongly related pairs 
in Experiment 1, there was a highly consistent pattern that positive reactivity of item-level JOLs 
located in the D parameter.  This provides evidence that item-level JOLs improved memory for 
related pairs mainly by enhancing recollection of verbatim details.   
Although prestudy JOLs did not modify recall performance at the behavior level, they did 
impair recollection for unrelated pairs at the process level.  At first sight, this seems to be 
conceptually consistent with the changed-goal hypothesis.  Here, one may argue that perhaps 
negative reactivity of prestudy JOLs was just not strong enough to manifest itself at the behavior 
level, considering that prestudy JOLs were less sensitive to cue-target relatedness relative to 
immediate JOLs (see Figure 3.1).  Thus, participants’ awareness of differences in item difficulty 
in the prestudy JOL condition may not be as sharp as in the immediate JOL condition, which 
47 
 
thus provides less motivation for them to switch their learning goals, resulting in weaker JOL 
reactivity.  However, this speculation could not explain why prestudy and immediate JOLs 
produced completely different effects at the process level: immediate JOLs enhanced 
recollection for related pairs whereas prestudy JOLs impaired recollection for unrelated pairs.  If 
the changed-goal hypothesis assumes that both immediate and prestudy JOLs produce reactivity 
by prompting participants to switch learning goals and focusing more on related pairs at the 
expense of unrelated pairs, the effects of prestudy and immediate JOLs should only vary in 
magnitudes but not in patterns. 
Last, it is observed that prestudy JOLs increased forgetting for both related and unrelated 
word pairs.  Why is that?  Recall that the time of each study trial was matched between the three 
JOL conditions in the current experiment.  That is, each word pair was presented for 4s and was 
either preceded by a 5-s JOL phase in the prestudy-JOL condition, followed by a 5-s JOL phase 
in the immediate-JOL condition, or followed by a 5-s attention check in the no-JOL condition.  
Thus, in such a case, it is possible that making a prestudy JOL for the next pair after the 
presentation of the prior pair might have interfered with the consolidation for the prior pair.  As a 
result, the verbatim traces of the word pairs may have not been as effectively rehearsed and 
stored in the prestudy JOL condition as in the other two JOL conditions, which made them less 
stable and thus more forgettable.  Notably, the increase in forgetting was larger for unrelated 
pairs than for related pairs, possibly because unrelated pairs require more cognitive resources to 
consolidate relative to related pairs, and thus they should suffer more from the interference in 
consolidation.  This may also explain why prestudy JOLs impaired recollection for unrelated but 
not for related pairs, which would need to be determined by future research.   
48 
 
CHAPTER 4 
EXPERIMENT 3 
As shown in Double et al.’s (2018) meta-analysis, most prior JOL reactivity studies were 
conducted with word pairs, whereas only a few studies used lists of single words.  As far as I am 
aware, Stevens and Pierce (2019) was the first study to examine JOL reactivity with categorical 
single-word lists.  Moreover, apart from the item-level JOLs that are used in most prior 
experiments, they also examined reactivity of list-level JOLs, in which participants were required 
to estimate how many words they expected to recall from a given word list (Mazzoni & Nelson, 
1995; Sahakyan et al., 2004).  Their results showed that only list-level but not item-level JOLs 
improved cued recall for categorical lists.  Here, a cued recall test is similar to a free recall test, 
with the only difference being that the categorical labels of the studied list were presented as test 
cues.  Note that there is an important difference between item- and list-level JOLs: The former 
should direct participants’ attention to item-specific cues while the latter should direct 
participants’ attention to inter-item relational cues.  Cued recall tests are more sensitive to 
relational cues than to item-specific cues, which explains why only list-level but not item-level 
JOLs produced positive reactivity for categorical lists in cued recall.   
In addition, Myers et al. (2020) reported that asking participants to make item-level JOLs 
for related word pairs only enhanced their subsequent performance in associative recall and 
recognition tests but not in free recall tests.  Note that associative recall and recognition tests 
provide participants with the cue words and require participants to recall or recognize a target 
word that is paired with a specific cue word.  In other words, participants have to retain pair-
specific cue-target relations.  Nevertheless, free recall tests do not provide participants with the 
cue words.  Instead, free recall tests only require participants to recall as many target words as 
49 
 
they can remember in any order, without considering the cue-target association within specific 
pairs.  Thus, associative recall and recognition are sensitive to pair-specific cue-target relations, 
which were strengthened by item-level JOLs, whereas free recall is sensitive to inter-pair target-
target relations, to which item-level JOLs were less beneficial.  
Taken together, Stevens and Pierce’s and Myers et al.’s studies are both in line with the 
cue-strengthening hypothesis that JOLs only produce positive reactivity when the cues 
strengthened by JOLs are favored by the subsequent memory test.  In Experiment 3, I further 
tested the cue-strengthening using a design that built upon both Stevens and Pierce (2019) and 
Myers et al. (2020).  Specifically, I used a 2 (Target-target relation: related, unrelated)  2 (Test 
format: free recall, associative recall)  3 (JOL conditions: item-JOL, list-JOL, no-JOL) mixed 
design.  Target-target relation and test format were manipulated within subjects and JOL 
condition was manipulated between subjects.  To manipulate target-target relation, every four 
consecutive word pairs were grouped into a list, and the targets of the four pairs on the same list 
were either semantically related or unrelated.  Such target-target relatedness encourages inter-
pair relational processing, in contrast to the cue-target relatedness, which primarily invites item-
specific processing.  Consistent with this notion, target-target relatedness was found to enhance 
free recall, whereas it either impaired or had no effect on associative recall (Brainerd & Reyna, 
2010; Schwenn & Underwood, 1968; Underwood et al., 1965).   
In Experiment 3, the core prediction of the cue-strengthening hypothesis is that JOL 
reactivity depends on the interaction between material type, JOL type, and test format.  First, list-
level JOLs should produce positive reactivity for target-target related pairs but little-to-no 
reactivity for target-target unrelated pairs in free recall.  This is because list-level JOLs direct 
participants’ attention to inter-pair target-target relations, to which free recall test is sensitive.  
50 
 
However, target-target unrelated pairs are less likely to enjoy such benefits than target-target 
related pairs, as there was no inherent inter-pair relatedness for list-level JOLs to draw upon and 
strengthen in those pairs.  Second, list-level JOLs should produce either negative or no reactivity 
for target-target related pairs and little-to-no reactivity for target-target unrelated pairs in 
associative recall.  Here, associative recall is sensitive to cue-target relatedness rather than target-
target relatedness, and target-target relatedness is either harmful or irrelevant to associative recall 
(Brainerd & Reyna, 2010; Schwenn & Underwood, 1968; Underwood et al., 1965).  Therefore, 
list-level JOLs, which focus people’s attention on the relatedness among target-target related 
pairs, should not redound to associative recall performance.  Third, item-level JOLs should 
produce negative or no reactivity for target-target related pairs and little-to-no reactivity for 
target-target unrelated pairs in free recall, because item-level JOLs are expected to emphasize 
item-specific features rather than inter-item relations (Mitchum et al., 2016; Myers et al., 2020), 
whereas free recall is more sensitive to the latter than the former.  Forth, item-level JOLs should 
also have little-to-no reactivity in associative recall for both target-target related and unrelated 
pairs.  This is because within-pair semantic relation is a dominant cue in making item-level 
JOLs, but such a cue was absent in both target-target related and unrelated pairs.  Thus, 
considering that no JOL reactivity was detected for cue-target unrelated pairs in prior studies 
(e.g., Soderstrom et al., 2015; Myers et al., 2020), item-level JOLs are not expected to improve 
associative recall for either target-target related or unrelated pairs. 
In summary, based on the cue-strengthening hypothesis, it is predicted that for target-
target related pairs, list-level JOLs would produce positive reactivity in free recall but negative 
or no reactivity in associative recall.  On the contrary, item-level JOLs should generate negative 
or no reactivity in free recall but little-to-no reactivity in associative recall.  As for target-target 
51 
 
unrelated pairs, little-to-no reactivity of both item-level and list-level JOLs is predicted in both 
free and associative recall. 
Method 
Participants 
Participants were 122 Cornell undergraduates (Mage = 20.15, SDage = 1.23) who 
participated for course credits.  Forty-two participants were randomly assigned to the item-JOL 
condition, 38 participants were randomly assigned to the list-JOL condition, and 42 participants 
were randomly assigned to the no-JOL condition.  The sample size per condition was 
comparable to that used in Myers et al. (2020).  One participant in the item-level JOL condition 
was removed from analyses for both JOLs and recall, who failed to provide JOLs for over 80% 
of the study trials.  In the end, data from 41 participants in the item-level JOL condition, 38 in 
the list-level JOL condition, and 42 in the no-JOL condition were included in the analyses. 
Materials 
The experiment was programmed and administered via Qualtrics.  The materials were 80 
word pairs, which were evenly divided into 20 lists of four word pairs.  The cue words of all 
pairs were chosen from the Nelson free association norms (D. L. Nelson et al., 2004).  For half of 
the lists, the target words of the four word pairs on the same list were the first four exemplars of 
a categorical list in the Van Overschelde et al. (2004) norms.  For the other half of the lists, the 
target words of the four word pairs on a given list shared no inter-pair target-target relation, as 
they were randomly picked from four separate categorical lists in the Van Overschelde et al. 
(2004) norms.  I made sure that there was no cue-target association within all word pairs, and 
that the concreteness, frequency, and length of both cue and target words were comparable 
52 
 
between target-target related pairs and unrelated pairs.  The materials for Experiment 3 are in 
Appendix D. 
Procedure 
The experimental procedure was very similar to Experiment 1 except for three 
modifications.  First, in the study phase of each block, participants studied 10 lists of four word 
pairs instead of 36 word pairs.  Second, participants took three consecutive free recall tests in one 
block and three consecutive associative recall tests in the other block.  In each block, participants 
were not told in advance whether they were going to take a free recall test or an associative recall 
test.  The procedure for associative recall tests was the same as in Experiment 1.  In free recall 
tests, participants were given a maximum of 3 minutes to write down as many words on the 
right-hand side of the studied pairs as they can.  They were told to write down the words in any 
order they like and that they should not worry about spelling.   
Third, I used slightly different instructions for the item-JOL condition compared to the 
prior experiments, and I added a list-JOL condition, in parallel to the item-JOL and no-JOL 
conditions.  In the item-JOL condition of the current experiment, participants were told that after 
the presentation of each pair, they would be asked to rate how likely they can recall the word on 
the right-hand side of the pair in a later memory test (from 0 -100, with 0 = not likely at all and 
100 = totally likely).  Here, because participants were not informed of the test format in advance, 
I removed the part of JOL instruction that was specific to associative recall (“when provided 
with the word on the left-hand side on a later memory test”), which was explicitly stated in the 
prior two experiments.  Moreover, participants were explicitly informed that they might or might 
not be provided the words on the left-hand side of the pairs in the later memory tests.   
53 
 
In the list-JOL condition, each word pair was presented for 10s without any JOL prompt 
just like in the no-JOL condition.  After each list of four word pairs was presented, participants 
were prompted to make a list-level JOL during a 10-s interval between consecutive lists 
(“Among the words on the right-hand side of the four word pairs you just studied, how many of 
them do you expect to remember on a later memory test?”).  Participants were required to enter a 
whole number between 0 and 4 into a blank box within 10 seconds.  Similar to the instructions 
for item-level JOLs, the instructions for list-level JOLs also explicitly informed participants that 
they might or might not be provided with the words on the left-hand side of the pairs in the later 
memory test.  An overview of the experiment design of Experiment 3 was shown in Figure 4.1.   
Pair type JOL condition Test format
40 target-target related pairs Item-level JOL Free recall test
convent – steel convent – steel Recall the words on the
rectangle – iron Likelihood to recall? right-side of the pairs you
circus – bronze just studied.
toad – lead List-level JOL
… Among the words on the
40 target-target unrelated pairs right-hand side of the four
tall – lime word pairs you just studied,
afraid – pencil how many of them do you Associative recall test
sock – tangle expect to remember? convent – ?
atom – salmon rectangle – ?
… No JOL
 
Figure 4.1. An overview of the experiment design of Experiment 3.  Pair type and test format 
were manipulated within subjects, while JOL condition was manipulated between subjects. 
Results 
ANOVA Results for JOLs 
Two 2 (Target-target relation: related, unrelated)  2 (Test format: associative recall, free 
recall)  2 (Test order: free/associative, associative/free) mixed ANOVAs were conducted on 
item-level JOLs and on list-level JOLs, respectively.  Although participants were not informed of 
54 
 
the test format during the study phase of either block, they may expect the same test format as in 
the first block during the study phase of the second block, which can potentially impact their 
JOLs.  Thus, to investigate the possibility, I included both test format and test order in the 
ANOVAs.   
The ANOVA for item-level JOLs showed a main effect of target-target relation in that 
item-level JOLs were higher for target-target related pairs (M = 37.02, SD = 17.70) than for 
target-target unrelated pairs (M = 33.21, SD = 16.71), F(1, 38) = 13.03, MSE = 41.38, η 2p  = .26, 
p < .001.  Meanwhile, the ANOVA revealed a main effect of test order as item-level JOLs were 
higher when free recall was administered in the first block and associative recall in the second 
(M = 41.32, SD = 16.43) than in the reverse order (M = 28.76, SD = 15.80), F(1, 38) = 7.66, MSE 
= 844.33, η 2p  = .17, p = .009.  There was a Test format  Target-target relation interaction, F(1, 
38) = 5.10, MSE = 28.43, η 2p  = .12, p = .030.  Nevertheless, post hoc tests revealed that item-
level JOLs did not differ significantly between associative and free recall for either target-target 
related or unrelated pairs.  Additionally, a Test format  Test order interaction was present, F(1, 
38) = 6.55, MSE = 132.50, η 2p  = .15, p = .015.  Here, post hoc tests demonstrated no significant 
difference in item-level JOLs between associative and free recall in either test order. 
Similarly, list-level JOLs were also higher for pairs whose targets were related (M = 2.42, 
SD = .95) than those whose targets were unrelated (M = 1.68, SD = .79), F(1, 35) = 45.53, MSE 
= .42, η 2p  = .57, p < .001.  Thus, participants incorporated information about inter-pair target-
target relation into both item- and list-level JOLs.  In addition, there was a Test order  Target-
target relation interaction, F(1, 35) = 4.91, MSE = .42, η 2p  = .12, p = .033.  However, post hoc 
tests suggested that the effect of target-target relatedness was significant in both test orders. 
55 
 
I also conducted an additional 2 (Target-target relation: related, unrelated)  2 (JOL 
condition: item-JOL, list-JOL) mixed ANOVA to compare the sensitivity to target-target 
relatedness between item-level JOLs and list-level JOLs.  Here, I first converted list-level JOLs 
from a 0-4 scale to a 0-100 scale.  This was done by dividing list-level JOLs by four and then 
multiplying the outcome by 100.  The ANOVA showed a main effect of target-target relation, 
F(1, 77) = 60.01, MSE = 28.17, η 2p  = .44, p < .001, a main effect of JOL condition, F(1, 77) = 
10.69, MSE = 328.3, η 2p  = .12, p = .002, and a Target-target relation  JOL condition interaction, 
F(1, 77) = 10.80, MSE = 28.17, η 2p  = .12, p = .002.  Pos hoc tests revealed that item-level JOLs 
were overall higher than list-level JOLs, and that the target-target relation effect was larger on 
list-level JOLs than on item-level JOLs.  Therefore, item-level JOLs were less sensitive to target-
target relation than list-level JOLs. 
ANOVA Results for Associative Recall 
A 2 (Target-target relation: related, unrelated)  3 (JOL condition: item-JOL, list-JOL, 
no-JOL)  3 (Test: 1, 2, 3)  2 (Test order: free/associative, associative/free) mixed ANOVA 
was conducted for associative recall.  The ANOVA indicated a main effect of target-target 
relation, F(1, 115) = 41.98, MSE = .02, η 2p  = .28, p < .001, a main effect of test, F(2, 230) = 
19.63, MSE = .004, η 2p  = .15, p < .001, a Target-target relation  JOL condition interaction, F(2, 
115) = 4.91, MSE = .02, η 2p  = .08, p = .009, a Target-target relation  Test order interaction, F(1, 
115) = 5.83, MSE = .02, η 2p  = .05, p = .017, and a Target-target relation  Test order  JOL 
condition interaction, F(2, 115) = 3.17, MSE = .02, η 2p  = .05, p = .046.  The two main effects 
were driven by the fact that associative recall was higher for target-target related pairs (M = .27, 
SD = .25) than for target-target unrelated pairs (M = .21, SD = .21), and was better on the first 
recall test (M = .26, SD = .25) than on the second (M = .23, SD = .23) and the third (M = .23, SD 
56 
 
= .23) recall tests.  Post-hoc tests for the Target-target relation  JOL condition interaction 
showed that associative recall did not reliably differ between the three JOL conditions (item-
JOL, list-JOL, no-JOL) for either target-target related or unrelated pairs.  This suggests that 
neither item-level JOLs nor list-level JOLs produced reactivity in associative recall for either 
type of word pairs (see Figure 4.2).  Post-hoc tests for the Target-target relation  Test order 
interaction revealed that associative recall was significantly higher for target-target related pairs 
than for target-target unrelated pairs regardless of the test order.  This suggests that the target-
target relatedness effect on associative recall was robust no matter whether associative recall was 
administered in the first or second block.  Given that the effect of Target-target relation was not 
substantially modified by either JOL condition or test order, no further post hoc tests were 
conducted for the Target-target relation  Test order  JOL condition interaction. 
57 
 
 
 
Figure 4.2. Associative recall for target-target related and target-target unrelated pairs across 
item-, list-, and no-JOL conditions in Experiment 3.  Panel A = recall test 1.  Panel B = recall test 
2.  Panel C = recall test 3.  Panel D = average recall across all three tests.  Error bars are based on 
SEs. 
ANOVA Results for Free Recall 
The 2 (Target-target relation: related, unrelated)  3 (JOL conditions: item-level, list-
level, no JOL)  3 (Test: 1, 2, 3)  2 (Test order: free/associative, associative/free) mixed 
ANOVA for free recall revealed a main effect of JOL condition, F(2, 115) = 5.22, MSE = .16, 
η 2p  = .08, p = .007.  Free recall was higher in the list-JOL condition (M = .31, SD = .26) than in 
the item-JOL (M = .21, SD = .19) and no-JOL (M = .19, SD = .18) conditions, while there was no 
58 
 
difference between the latter two conditions.  Meanwhile, a main effect of test order was present, 
F(1, 115) = 7.73, MSE = .16, η 2p  = .06, p = .006, free recall was better when participants took the 
free recall test first.  Also, there was a main effect of target-target relation, F(1, 118) = 101.50, 
MSE = .04, η 2p  = .46, p < .001, as free recall was better for target-target related pairs (M = .31, 
SD = .24) than for target-target unrelated pairs (M = .16, SD = .16).  In addition, there were an 
Target-target relation  JOL condition interaction, F(2, 115) = 20.31, MSE = .04, η 2p  = .26, p 
< .001.  As shown in Figure 4.3, the free recall advantage in the list-JOL condition over the item- 
and no-JOL conditions was only reliable for target-target related pairs (Ms = .48 vs. .17 vs. .20) 
but not for target-target unrelated pairs (Ms = .17 vs. .12 vs. .14).  Last, there was a Target-target 
relation  Test interaction, F(2, 230) = 4.47, MSE = .002, η 2p  = .04, p = .012.  However, post hoc 
tests showed that the effects of target-target relation did not change substantially across the three 
free recall tests, as target-target related pairs were always recalled better than target-target 
unrelated pairs. 
59 
 
 
Figure 4.3.  Free recall for target-target related and target-target unrelated pairs across item-, 
list-, and no-JOL conditions in Experiment 3.  Panel A = recall test 1.  Panel B = recall test 2.  
Panel C = recall test 3.  Panel D = average recall across all three tests.  Error bars are based on 
SEs. 
Model Results for Associative Recall 
The same dual-retrieval model (Chang, 2019) was used as in Experiments 1 and 2.  As 
can be seen in the upper section of Table 4.1, the dual-retrieval model delivered excellent fits to 
the associative recall data across all six possible combinations between JOL condition (item-, 
list-, and no-JOL) and target-target relation (related, unrelated).  The average G2(1) was .87, 
which is again below the critical value of 3.84.  The parameter estimates are also displayed in 
Table 4.1.  For target-target related pairs, the F parameter was lower in the list-JOL condition 
60 
 
compared to the item-JOL condition (.03 vs. .20), ∆G2 = 7.63, p = .006.  This means that list-
level JOLs reduced forgetting compared to item-level JOLs.  In addition, the J2 and J3 parameters 
were both higher in the item-JOL conditions (.95 and .95), relative to the list-JOL (.61and .49) 
and the no-JOL (.72 and .62) conditions, ∆G2s > 5.24, ps < .022, whereas there were no reliable 
differences between the latter two JOL conditions.  In summary, list-JOLs reduced forgetting for 
target-target related pairs compared to item-JOLs, whereas item-JOLs made target-target related 
pairs feel more familiar after they were reconstructed by searching through a possible set of 
candidate items.  In addition, the ordering of the D parameter for target-target related pairs was 
item-JOL (.29) > list-JOL (.18) > no-JOL (.13).  Here, no pairwise comparison reached the 
conventional level of statistical significance, although the difference between the item- and no-
JOL conditions approached significance, ∆G2= 3.78, p = .052. 
As for target-target unrelated pairs, similar to target-target related pairs, the F parameter 
was again lower in the list-JOL condition compared to the item-JOL condition (.00 vs. .21), ∆G2 
= 7.63, p = .006, and the J2 and J3 parameters were again lower in the list-JOL condition (.55 
and .50) than in the item-JOL (.93 and .90) condition, ∆G2s > 9.25, ps < .003.  This echoes the 
aforementioned finding that list-level JOLs reduced forgetting compared to the item-level JOLs, 
while item-level JOLs enhanced familiarity relative to list-level JOLs.  Additionally, the J1, J2, 
and J3 parameter were all lower in the list-JOL (.67, .55, and .50) condition than in the no-JOL 
(.80, .67 and .64) condition, ∆G2s > 4.11, ps < .043.  Thus, list-level JOLs systematically 
decreased familiarity for target-target unrelated word pairs.  Meanwhile, the D parameter was 
significantly higher in the item-JOL condition compared to the list-JOL and the no-JOL 
conditions (.29 vs. .13 vs. .08), ∆G2s > 6.62, ps < .013, suggesting that item-level JOLs enhanced 
recollection for verbatim details for target-target unrelated pairs relative to the other two JOLs 
61 
 
conditions.  Last, the ordering of the R parameter was no-JOL (.24) > list-JOL (.21) > item-JOL 
(.05), with all pairwise comparisons yielding significant difference, ∆G2s > 4.42, ps < .036.  
Therefore, when item-level and list-level JOLs were administered, participants found it harder to 
reconstruct the target words of target-target unrelated pairs, compared to when there were no 
JOLs solicited. 
Table 4.1 
Dual-Retrieval Model Fits and Parameter Estimates for Experiment 3 
Test format Target-target JOL  
relation condition G2 D F J1 J2 J3 R 
Associative recall          
 Related         
  Item-JOL .00 .29 .20 .50 .95 .95 .07 
  List-JOL 1.61 .18 .03 .65 .61 .49 .23 
  No-JOL .48 .13 .13 .79 .72 .62 .22 
 Unrelated         
  Item-JOL .14 .29 .21 .40 .93 .90 .05 
  List-JOL 2.69 .13 .00 .67 .55 .50 .21 
  No-JOL .29 .08 .10 .80 .67 .64 .24 
Free recall          
 Related         
  Item-JOL .68 .08 .17 .77 .81 .91 .23 
  List-JOL .12 .32 .06 .61 .63 .83 .31 
  No-JOL .09 .19 .08 .59 .49 .49 .13 
 Unrelated         
  Item-JOL .56 .09 .15 .73 .73 .65 .14 
  List-JOL .05 .08 .20 .70 .74 .73 .16 
  No-JOL .21 .09 .22 .55 .52 .57 .15 
Note. D = direct access parameter; F = forgetting parameter; J1 = familiarity judgment 
parameter for test 1; J2 = familiarity judgment parameter for test 2; J3 = familiarity judgment 
parameter for test 3; R = reconstruction parameter.  Parameters that differed reliably across 
JOL conditions are printed in boldface. 
 
Model Results for Free Recall 
The free recall data were fit to the same dual-retrieval model as the associative recall data 
(Chang, 2019).  As can be seen in the lower section of Table 4.1, the model also delivered 
62 
 
excellent fits to the free recall data across all six possible combinations between JOL condition 
(item-, list-, and no-JOL) and target-target relation (related, unrelated), with an average G2(1) 
of .29.  For target-target related pairs, the ordering of the D parameter was list-JOL (.32) > no-
JOL (.19) > item-JOL (.08), with all pairwise comparisons yielding significant differences, 
∆G2s > 7.94, ps < .005.  This suggests that list-level JOLs enhanced direct access to verbatim 
traces for target-target related pairs in free recall, whereas item-level JOLs impaired it.  Next, the 
J2 and J3 parameters were higher in the item-JOL condition (.81 and .91) and list-JOL condition 
(.63 and .83) than in the no-JOL condition (.49 and .49), ∆G2s > 7.42, ps < .007.  In addition, the 
list- and item- JOL conditions differed significantly in the J2 parameter, ∆G2 = 4.15, p = .042.  
These results suggest that both item-level and list-level JOLs increased familiarity for target-
target related pairs in free recall, relative to the no-JOL condition.  Last, the R parameter was 
significantly higher in the list-JOL condition (.31) and the item-JOL condition (.23) compared to 
the no-JOL condition (.13), ∆G2s > 5.23, p < .022.  On the contrary, no condition-wise 
differences were found in parameters for target-target unrelated pairs. 
Discussion 
Consistent with the prediction of the cue-strengthening hypothesis, Experiment 3 showed 
that item-level JOLs produced no benefits for either target-target related or unrelated pairs, in 
either associative or free recall.  However, while list-level JOLs had no effects on target-target 
unrelated pairs in either type of recall, they improved free recall (but not associative recall) for 
target-target related pairs.  Please bear in mind that the cue-strengthening hypothesis suggests 
that JOL reactivity only arises when the cues strengthened by JOLs are matched with the cues 
used in the memory test.  Here, when target-target related pairs, list-level JOLs, and free recall 
tests were administered, making list-level JOLs should strengthen the target-target relatedness 
63 
 
among pairs, to which free recall is very sensitive.  Thus, list-level JOLs produced positive 
reactivity for target-target related pairs in free recall, as predicted by the cue-strengthening 
hypothesis.  Some alternative explanations for positive reactivity are that list-level JOLs simply 
offered spaced retrieval practice or that list-level JOLs enhanced participants’ expectancy for 
free recall tests.  However, such accounts would have difficulty explaining why list-level JOLs 
produced a dramatic improvement in free recall for target-target related pairs but not at all for 
target-target unrelated pairs.  Given that there were only four pairs (and thus only four target 
words) on a list and retrieval practice usually produces a robust boost in memory performance 
(see Karpicke, 2017 for a review), if participants were using list-level JOLs just as a retrieval 
practice or if list-level JOLs prompted them to prepare for a free recall, there should also be 
recall benefits for target-target unrelated pairs, too.  Additionally, as it will be seen in 
Experiment 4, list-level JOLs did not enhance free recall for blocked categorical lists, which 
would be hard to explain if list-level JOLs merely function as retrieval practices.  
Apart from the combination of target-target related pairs, list-level JOLs, and free recall 
test, all other scenarios failed to fulfill the match in cues between study materials, JOLs, and 
memory tests, as requested by the cue-strengthening hypothesis.  Thus, it is not surprising that no 
JOL reactivity was observed for them.  For example, list-level JOLs produced no reactivity for 
target-target unrelated pairs in free recall, because there was no semantic relation between the 
target words of consecutive pairs, and list-level JOLs were less likely to draw upon and 
strengthen target-target relation.  In addition, list-level JOLs produced no reactivity for target-
target related pairs in associative recall because associative recall is sensitive to information 
specific to each pair rather than inter-pair relation.  Similarly, item-level JOLs produced no 
reactivity for target-target related or unrelated pairs in associative recall, because associative 
64 
 
recall is particularly sensitive to within-pair relation, but there was no inherent cue-target 
relatedness in either type of pairs for item-level JOLs to strengthen.  Last, item-level JOLs also 
produced no reactivity for target-target related pairs in free recall, because item-level JOLs were 
not able to strengthen the cues favored by free recall: inter-pair relational cues.   
Additionally, it can be seen that both item-level and list-level JOLs were higher for 
target-target related pairs than for target-target unrelated pairs, suggesting that participants 
perceived the former type of pairs as easier to remember than the latter.  Recall that the changed-
goal hypothesis assumes that JOLs will change people’s study goals and prompt them to allocate 
more resources to study easier items at the cost of harder items.  Thus, in the current scenario, 
the changed-goal hypothesis predicts positive reactivity for target-target unrelated pairs and 
negative reactivity for target-target unrelated pairs, regardless of the test format.  However, this 
was not what the data showed: Negative reactivity was found for neither type of pairs and 
positive reactivity for target-target related pairs occurred only in free recall but not in associative 
recall.  Therefore, although the primary goal of Experiment 3 was to test the cue-strengthening 
hypothesis, the results again provide counterevidence against the changed-goal hypothesis. 
A slightly surprising finding is that associative recall was overall better for target-target 
related pairs than for target-target unrelated pairs, given that prior studies typically showed no 
effects or negative effects of target-target relation on associative recall (Brainerd & Reyna, 2010; 
Schwenn & Underwood, 1968; Underwood et al., 1965).  Rivers and Dunlosky’s (2021) recent 
findings provide a possible explanation here.  Their results showed that participants’ recall was 
similar between target-target related and unrelated pairs in both associative and free recall if they 
were told at the beginning that they would take an associative recall test.  However, when 
participants were instructed to expect a free recall, they had better recall for target-target related 
65 
 
pairs than for target-target unrelated pairs in both test formats.  Thus, the effects of target-target 
relatedness on recall performance (across both associative and free recall) seem to be moderated 
by test expectancy.  Given that both associative and free recall were higher for target-target 
related pairs than for target-target unrelated pairs in the current study, participants might be 
overall more inclined to expect a free recall test than an associative recall test, despite that we 
provided no explicit information about the test format until the test phase.  Such a tendency to 
expect free recall may be attributed to the salient target-target relatedness among half of the 
pairs, which can prompt participants to focus more on remembering the target words among 
pairs rather remembering specific cue-target pairing.  However, this explanation is post-hoc and 
speculative, and future research is recommended to further test whether study materials can 
modify participants’ expectations about test format. 
At a more fine-grained level, the dual-retrieval model revealed that list-level JOLs 
improved free recall for target-target related pairs by driving up the D, R, and J parameters.  That 
is, making list-level JOLs helped participants to better grasp the meaning connection between the 
target words across word pairs, which provides them with better access to the verbatim details of 
each specific target word, helps them to better reconstruct the target words based on categorical 
memberships when those words cannot be directly recollected, and makes the reconstructed 
words more likely to be outputted based on perceived familiarity.  To sum up, list-level JOLs 
turned out to be a sledgehammer operation for target-target related pairs in free recall, which 
enhances both item-specific verbatim processing and relational gist processing. 
Moreover, an intriguing contrasting pattern is observed in the D, R, and J parameters 
between associative recall and free recall.  First, for target-target related pairs, the ordering for 
the D parameter among the three JOL conditions was reversed for associative recall (item-JOL > 
66 
 
list-JOL > no-JOL) compared to for free recall (list-JOL > no-JOL > item-JOL).  The item-JOL 
condition marginally improved direct verbatim access relative to the no-JOL condition in 
associative recall but they reduced it in free recall.  On the contrary, list-level JOLs did not affect 
direct access during associative recall, but they enhanced it during free recall.  Note that item-
level JOLs should prompt participants to focus more on item-specific processing and divert them 
from inter-item relational processing, but list-level JOLs should prefer inter-item relational 
processing to item-specific processing.  Meanwhile, associative recall is sensitive to item-
specific cues, whereas free recall relies heavily on inter-item relational processing.  Thus, it 
appears that JOLs only boosted the D parameter when there was consistency in the cue 
preference between JOLs and the memory test, such as when list-level JOLs are followed by free 
recall or when item-level JOLs are followed by associative recall.   
Second, list-level JOLs reduced both the R and J parameters for target-target unrelated 
pairs in associative recall, but they increased both R and J parameters for target-target related 
pairs in free recall.  In the former scenario, associative recall favors item-specific cues rather 
than inter-item relations.  However, list-level JOLs prompted participants to focus on inter-item 
relations instead of item-specific features, when the target words between consecutive pairs were 
not meaningfully related.  Therefore, list-level JOLs misguided the encoding process for target-
target unrelated pairs, which in turn disrupted the reconstruction operation for those items and 
discouraged participants from outputting the reconstructed items due to low levels of perceived 
familiarity.  On the contrary, in the latter scenario, free recall relies heavily on inter-item 
relational cues.  Because the target words of target-target related pairs shared categorical 
membership, list-level JOLs effectively facilitated the processing of inter-item relational cues.  
Thus, the strengthened relational information led to the outcomes that the target words of target-
67 
 
target related pairs were easier to be reconstructed if they could not be recollected and that the 
reconstructed items were more likely to be outputted because they seemed more familiar.  To 
sum up, the different effects of item- and list-level JOLs on the D, R, and J parameters across test 
formats suggest that JOL reactivity depends heavily on transfer appropriateness, which aligns 
with the cue-strengthening hypothesis. 
  
68 
 
CHAPTER 5 
EXPERIMENT 4 
As mentioned previously, Stevens and Pierce (2019; Experiment 2) found no reactive 
effects of item-level JOLs on recall of categorical word lists.  Nevertheless, they reported in their 
Experiment 3 that list-level JOLs produced significant recall improvement relative to the no-JOL 
condition.  Thus, they concluded that item-level JOLs produced no reactivity on recall for 
categorical lists, but list-level JOLs produced positive reactivity.  However, Senkova and Otani 
(2021) reported the contradictory pattern that item-level JOLs produced positive reactivity on 
recall for categorical lists.  Moreover, they proposed that the positive item-level JOL reactivity 
results from enhanced item-specific processing, because they found comparable levels of 
memory improvement between item-level JOLs and two typical item-processing manipulations 
(Experiment 1: pleasantness rating; Experiment 2: mental imagery).   
A methodological discrepancy between the two studies may contribute to the inconsistent 
findings.  In Stevens and Pierce’s (2019) experiments, the categorical lists were presented in a 
blocked manner such that words that belong to the same category were always presented 
consecutively. However, in Senkova and Otani’s (2021) experiments, the order of words was 
randomized across the categorical lists, so that words that belong to the same category were not 
presented consecutively.  Thus, reactivity of item-level JOL on recall for categorical lists may be 
constrained by list organization (randomized vs. blocked). 
Experiment 4 was designed to reconcile the mixed findings between Stevens and Pierce 
(2019) and Senkova and Otani (2021) and to revisit Senkova and Otani’s item-specific 
hypothesis.  Regarding the first aim of Experiment 4, I examined whether the two previous 
findings can be replicated within a single experiment.  One is Senkova and Otani’s finding that 
69 
 
item-level JOLs enhanced recall for categorical lists when the lists were presented in a 
randomized format.  The other is Stevens and Pierce’s finding that item-level JOLs failed to 
affect recall for categorical lists when the lists were presented in a blocked format.  If both 
findings were replicated in the current experiment with standardized word lists and procedures, I 
would be able to attribute the contradictory results to the difference in list organization.   
Regarding the second aim of Experiment 4, according to Senkova and Otani’s item-
specific hypothesis, item-level JOLs improve recall for categorical lists by enhancing item-
specific processing, which is not readily activated with categorical lists because such lists favor 
relational processing.  If reactivity of item-level JOLs is indeed driven by enhanced item-specific 
processing, then positive reactivity should be observed in both blocked and randomized 
categorical lists.  Additionally, as Senkova and Otani noted in the discussion of their findings, 
similar performance between the item-JOL conditions and the conditions that are known to 
enhance item-specific processing (pleasantness rating or mental imagery) does not guarantee 
similar underlying processes.  However, the use of the dual-retrieval model in the current study 
can remove such uncertainty by delivering quantitative parameters for separate underlying 
processes.  Thus, if enhanced item-specific processing can account for reactivity of item-level 
JOLs, the effects of item-level JOLs should locate in parameters that pertain to item-specific 
processing, namely the direct access (D) or forgetting of direct access (F) parameters.  
In Experiment 4, the organization of categorical lists (randomized vs. blocked) and JOL 
condition (item-JOL, list-JOL, and no-JOL) were factorially manipulated, with both being 
manipulated between subjects.  In addition to item-level JOLs, list-level JOLs were administered 
as in Stevens and Pierce’s study, which is an attempt to replicate their finding of positive 
reactivity of list-level JOLs on recall for blocked categorical lists.  Additionally, list-level JOLs 
70 
 
were not expected to produce positive reactivity on recall for randomized categorical lists, given 
that words from different categorical lists are intermixed and thus there were no coherent 
categorical relations within individual lists. 
Method 
Participants 
Participants were 240 young adults (Mage = 24.02, SDage = 4.44) recruited from Prolific.  
Participants were all fluent English speakers who were located in the United States, Canada, or 
the United Kingdom, and they were paid $2.33 per person for participation.  Participants were 
first randomly assigned to either the item-JOL condition, the list-JOL condition, or the no-JOL 
condition.  Then, within each of the three JOL conditions, participants were randomly assigned 
to either a randomized list condition or a blocked list condition.  Among the 80 participants who 
were randomly assigned to the item-JOL condition, 45 participants were assigned to the blocked 
list condition, and 35 were assigned to the randomized list condition.  Among the 81 participants 
who were randomly assigned to the list-JOL condition, 36 participants were assigned to the 
blocked list condition, and 45 were assigned to the randomized list condition.  Among the 79 
participants who were randomly assigned to the no-JOL condition, 42 participants were assigned 
to the blocked list condition, and 37 were assigned to the randomized list condition.  The 
participant assignment was slightly imbalanced between randomized and blocked list conditions 
due to a technical error in Qualtrics, but the sample size in all conditions of Experiment 3 was 
comparable to or larger than that in Senkova and Otani (2021; Experiment 1). 
Materials 
The experiment was programmed and administered via Qualtrics.  The study material 
was a 40-word list, which consisted of words from five 8-word categorical lists.  I used the four 
71 
 
categorical lists that were used in Senkova and Otani (2021) and added another categorical list, 
which was similarly constructed based on the Van Overschelde et al. (2004) category norms (See 
Appendix E).  In the blocked condition, the words on each of the five categorical lists were 
presented consecutively.  In the randomized condition, the words on the five categorical lists 
were randomly mixed and grouped into five new lists, with the constraint that no more than three 
consecutive words were from the same categorical list.  For both the blocked and randomized 
lists, the order of words within each list was fixed across all participants, while the order of lists 
was randomized for each participant.  The word lists used in Experiment 4 are displayed in 
Appendix E. 
Procedure 
Participants were randomly assigned to either the item-JOL condition, the list-JOL 
condition, or the no-JOL condition.  All participants completed a study phase and a test phase.  
In the study phase, participants studied 40 words, with each word presented for 2 seconds.  In the 
item-JOL condition, after each word was presented for 2 seconds, the word disappeared and a 
JOL prompt (“Likelihood to recall?”) appeared.  Participants were required to rate how likely 
they can recall the word on a later memory test (from 0 -100, with 0 = not likely at all and 100 = 
totally likely), and they were told to fine-tune their judgments by using the whole 100-point 
percentage scale.  Participants were given a maximum of 4 seconds to make their JOLs, and they 
need to type their responses into a blank box under the JOL prompt.  When 4 seconds were up, 
the program automatically proceeded to the next word pair.  In the no-JOL condition, the only 
difference from the item-JOL condition was that the JOL task was replaced by a random number 
generating task as in Senkova and Otani (2021).  Specifically, I asked participants to generate a 
random number between 0 and 100 within 4 seconds after the presentation of each word.  In the 
72 
 
list-JOL condition, participants were also required to generate a random number after each word 
was studied, just like in the no-JOL condition.  In addition, after each list of eight words was 
presented, participants were prompted to make a list-level JOL during a 10-s interval between 
consecutive lists (“How many words do you expect to remember from the list on a later memory 
test?”).  Participants were required to enter a whole number between 0 and 8 into a blank box.  
The procedure for the test phase was the same as the procedure for free recall tests in Experiment 
3. 
Results 
ANOVA Results for JOLs 
To examine the effects of list organization (blocked vs. randomized) on JOLs, I 
conducted two separate one-way ANOVAs for item- and list-level JOLs, respectively.  The 
effects of list organization on item-level JOLs approached significance, F(1, 78) = 2.96, MSE = 
305.31, η 2p  = .04, p = .089, with item-level JOLs being marginally higher for blocked categorical 
lists (M = 57.36, SD = 15.46) than for randomized categorical lists (M = 50.58, SD = 18.88).  
Meanwhile, list-level JOLs were significantly higher for blocked categorical lists (M = 4.89, SD 
= 1.19) than for randomized categorical lists (M = 3.73, SD = 1.06), F(1, 79) = 20.89, MSE = 
1.29, η 2p  = .21, p < .001.   
Similar to Experiment 3, I converted list-level JOLs to a 0-100 scale and conducted an 
additional 2 (List organization: blocked, randomized)  2 (JOL condition: item-JOL, list-JOL) 
between-subject ANOVA to inspect whether list-level JOLs were more sensitive to list 
organization than item-level JOLs.  The ANOVA showed only a main effect of list organization, 
F(1, 157) = 17.77, MSE = 253.31, η 2p  = .10, p < .001, but no List organization  JOL condition 
73 
 
interaction, suggesting that there was no significant difference between item- and list-level JOLs’ 
sensitivity to list organization. 
ANOVA Results for Recall 
I first conducted a 2 (List organization: blocked, randomized)  3 (JOL condition: item-
JOL, list-JOL, no-JOL)  3 (Test: 1, 2, 3) mixed ANOVA for recall.  The ANOVA revealed a 
main effect of list organization, F(1, 234) = 13.24, MSE = .11, η 2p  = .05, p < .001, a main effect 
of JOL condition, F(2, 234) = 3.14, MSE = .11, η 2p  = .03, p = .045, and a main effect of test, F(2, 
468) = 5.00, MSE = .004, η 2p  = .02, p = .007.  As can be seen in Figure 5.1, the main effects were 
driven by the fact that recall was higher for blocked categorical lists (M = .45, SD = .21) than for 
randomized categorical lists (M = .37, SD = .18), higher in the item-JOL condition (M = .45, SD 
= .17) than in the list-JOL condition (M = .38, SD = .23), and higher on the first recall test (M 
= .42, SD = .19) than on the second (M = .40, SD = .20) or third recall test (M = .41, SD = .21).  
Additionally, there was a JOL condition  Test interaction, F(4, 468) = 2.99, MSE = .004, η 2p  
= .03, p = .019, although post hoc tests showed that the JOL condition effect was significant 
across all three recall tests.  Last, the interaction that is of primary interest, the JOL condition  
List organization interaction, approached significance, F(2, 234) = 2.49, MSE = .11, η 2p  = .02, p 
= .085.   
74 
 
 
 
Figure 5.1. Free recall for blocked categorical lists and randomized categorical lists across the 
item-, list-, and no-JOL conditions in Experiment 4.  Panel A = recall test 1.  Panel B = recall test 
2.  Panel C = recall test 3.  Panel D = average recall across all three tests.  Error bars are based on 
SEs. 
Recall that one of the aims of Experiment 4 was to replicate Senkova and Otani’s (2021) 
finding, in which recall for randomized categorical lists was higher in the item-JOL condition 
than in the no-JOL condition.  Meanwhile, I also hypothesized that list-level JOLs would not 
enhance recall for randomized categorical lists.  Therefore, although the JOL condition  List 
organization interaction did not reach the convention criterion of statistical significance, I still 
conducted a planned one-way ANOVA to compare recall between the item-JOL, list-JOL, and 
75 
 
no-JOL conditions specifically for randomized categorical lists.  Here, because recall results 
were significantly different across the three test cycles, I only included test 1 data in this planned 
analysis for comparison to Senkova and Otani, as they had administered only one single test 
cycle.  Similarly, I used least significant difference (LSD) tests for post hoc analyses, just like 
Senkova and Otani did.   
An inspection of Figure 5.1 revealed that test 1 recall data displayed a very similar 
pattern relative to the average recall across tests 1-3.  The one-way ANOVA showed that the 
main effect of JOL condition was significant, F(2, 115) = 4.80, MSE = .03, η 2p  = .08, p = .010.  
LSD tests suggested that the item-JOL condition (M = .44, SD = .16) produced higher recall for 
randomized categorical lists than both the list-JOL condition (M = .32, SD = .18) and the no-JOL 
condition (M = .36, SD = .18), ps = .004 and .039.  Therefore, I successfully replicated Senkova 
and Otani’s result that the item-JOL condition produced better recall for randomized categorical 
lists compared to the no-JOL condition.  Additionally, as predicted, there was no difference in 
recall for randomized categorical lists between the list- and no-JOL conditions, indicating that 
list-level JOLs produced no reactivity in this scenario. 
Another aim of Experiment 4 was to replicate Stevens and Pierce’s (2019) finding that 
the item-JOL condition did not improve recall for blocked categorical lists compared to the no-
JOL condition but the list-JOL condition did.  Therefore, I also conducted a separate one-way 
ANOVA to compare the recall between item-, list-, and no-JOL conditions specifically for 
blocked categorical lists.  I again restricted this analysis to test 1 data for comparison, as Stevens 
and Pierce only administered one test cycle.  As shown in Figure 5.1, the recall for blocked 
categorical lists in test 1 seemed comparable across the three JOL conditions.  Indeed, the 
ANOVA showed that there was no difference in recall for blocked lists between the item-JOL 
76 
 
(M = .47, SD = .17), list-JOL (M = .48, SD = .20), and no-JOL conditions (M = .44, SD = .17), 
F(2, 119) = .50, MSE = .03, η 2p  = .008, p = .606.  Therefore, although I found the same result as 
Stevens and Pierce that item-level JOLs did not enhance recall for blocked lists, I did not 
replicate the recall enhancement they found in the list-JOL condition.   
Model Results 
The free recall data in Experiment 4 were fit to a slightly modified dual-retrieval model 
relative to Experiments 1, 2, and 3, to accommodate the methodological differences between 
Experiment 4 and the prior three experiments.  The prior three experiments all used word pairs 
followed by associative recall tests (although Experiment 3 used both associative and free recall 
tests), whereas Experiment 4 used single-word lists followed by free recall tests.  The modified 
model was developed specifically for free recall tests for lists of single words, which has the 
same six parameters as the previous model: D, F, R, J1, J2, and J3.  The only difference from the 
previous model lies in the F parameter, which is now defined as the probability of forgetting on 
the second or third recall test.  While the previous model assumes forgetting can only occur 
jointly on both recall tests 2 and 3, the modified model allows for the possibility that participants 
still had direct access on the second recall test but they lost it on the third recall test, and the 
probability of forgetting was assumed to be equal between the last two recall tests (see Appendix 
A for more details).  As can be seen in Table 5.1, this modified model delivered excellent fits to 
the recall data across all possible combinations between JOL conditions (item-, list-, and no-
JOL) and list organization (blocked, randomized) except for blocked lists in the list-JOL 
condition.  The average G2(1) of 3.12 was still below the critical value of 3.84, suggesting that 
the model provided acceptable fits to the current data.   
77 
 
For the blocked categorical lists, the F parameter was lower in the item-JOL condition 
(.05) than in the list-JOL and no-JOL conditions (.10 and .09), ∆G2s > 8.44, ps < .004.  This 
suggests that item-level JOLs functioned as a buffer against forgetting for blocked categorial 
lists.  Meanwhile, the J2 parameter was higher in the list-JOL condition than in the item-JOL 
condition, ∆G2 = 4.47, p = .034, suggesting that words followed by list-level JOLs felt more 
familiar in the later recall tests relative to those followed by item-level JOLs.   
Table 5.1 
Dual-Retrieval Model Fits and Parameter Estimates for Experiment 4 
List organization JOL condition G2 D F J1 J2 J3 R 
Blocked         
 Item-JOL .10 .42 .05 .44 .54 .80 .20 
 List-JOL 14.35 .42 .10 .50 .70 .78 .20 
 No-JOL .37 .38 .09 .45 .63 .80 .22 
Randomized         
 Item-JOL .22 .36 .06 .58 .64 .83 .22 
 List-JOL 3.63 .22 .10 .81 .65 .91 .16 
 No-JOL .04 .31 .07 .41 .62 .86 .17 
Note. D = direct access parameter; F = forgetting parameter; J1 = familiarity judgment 
parameter for test 1; J2 = familiarity judgment parameter for test 2; J3 = familiarity 
judgment parameter for test 3; R = reconstruction parameter.  Parameters that differed 
reliably between JOL conditions are printed in boldface. 
 
The patterns were quite different for the randomized lists.  Here, the ordering for the D 
parameter was item-JOL condition (.36) > no-JOL condition (.31) > list-JOL condition (.22), 
with all pairwise comparisons being significant except that the difference between the item-JOL 
and no-JOL conditions was on the boundary for statistical significance, ∆G2s ≥ 3.84, ps ≤ .050.  
This suggests that item-level JOLs enhanced direct access to verbatim traces while list-level-
JOLs impaired it compared to the no-JOL condition.  Meanwhile, the ordering of J1 parameter 
was list-JOL condition (.81) > item-JOL condition (.58) > no-JOL condition (.41).  All pairwise 
78 
 
comparisons yielded significant differences, ∆G2s > 4.11, ps < .043, suggesting that item-level 
and list-level JOLs both increased familiarity for reconstructed words on the randomized lists. 
Discussion 
In Experiment 4, supporting evidence was found for my hypothesis about the discrepant 
findings between Stevens and Pierce (2019) and Senkova and Otani (2021).  Namely, reactivity 
of item-level JOLs on free recall for categorical lists was bounded by list organization: Item-
level JOLs produced positive reactivity when categorical lists were presented in a randomized 
manner but not in a blocked manner.  The dual-retrieval model analyses revealed that the recall 
advantage for randomized lists in the item-JOL condition was driven by the enhancement in both 
the D and J parameters.  That is, the item-JOL condition provided better access to the verbatim 
traces of words’ presentations and increased the tendency for familiarity judgment to pass 
reconstructed words for output, relative to the no-JOL conditions.   
Because verbatim traces contain literal surface details of specific items, my result seems 
in harmony with Senkova and Otani’s hypothesis that item-level JOLs enhanced memory by 
improving item-specific processing.  However, Senkova and Otani’s item-specific hypothesis 
would have difficulty explaining why item-level JOLs enhanced recall for randomized but not 
for blocked categorical lists.  According to this hypothesis, categorical lists encouraged 
participants to engage in relational processing whereas uncategorical lists promoted item-specific 
processing.  Thus, if item-level JOLs enhanced item-specific processing, it should improve 
memory for categorical lists, where such processing was not already solicited, more than for 
uncategorical lists, where such processing was readily provoked.  If that is the case, the item-
specific hypothesis predicts positive JOL reactivity for categorical lists no matter when lists are 
randomized or blocked.  Notably, item-level JOLs should have more robust positive reactivity 
79 
 
with blocked categorical lists, because blocked categorical lists induce even stronger relational 
processing than randomized categorical lists, in which case item-specific processing should be 
more beneficial for memory performance.   
Why did reactivity of item-level JOLs only occur in randomized categorical lists but not 
in blocked categorical lists? One possible explanation offered by the model analysis is that 
positive JOL reactivity for randomized categorical lists results from a combination of enhanced 
recollection of item-specific details and enhanced familiarity based on relational gist.  As can be 
seen in Table 5.1, item-level JOLs improved both verbatim-based and gist-based retrieval 
processes (D and J1) for the randomized lists, whereas they only affected one verbatim-based 
process (F) for blocked lists.  Moreover, for randomized lists, the difference in the D parameter 
between the item- and no-JOL conditions was quite small and on the boundary of statistical 
significance, but the difference in the J1 parameter was much larger and highly significant.  Thus, 
it is possible that although item-level JOLs did improve item-specific processing, the 
improvement in relational processing, which increased the likelihood of outputting reconstructed 
words based on familiarity, was a necessary contributor to positive reactivity of item-level JOLs 
for randomized categorical lists.  If that is the case, it is obvious that the enhancement in 
relational processing should be more beneficial for randomized than for blocked lists, because 
such processing is more readily solicited by the latter than by the former.  That is, with 
categorically related words presented consecutively, participants would naturally focus on the 
meaning connection among list words, but when categorically related words were not presented 
consecutively, participants would need more cognitive resources to grasp the semantic relations 
among words and then regrouped those words under a common theme.  Therefore, if positive 
reactivity of item-level JOL for categorical lists was partially driven by relational processing, it 
80 
 
should be stronger with the randomized than with the blocked categorical lists.  However, it 
should be acknowledged that this explanation is post hoc and speculative, which needs to be 
further examined in future research. 
Meanwhile, it was predicted that list-level JOLs should not produce reactivity for 
randomized categorical lists, because list-level JOLs direct participants’ attention to the relations 
among the words on the same list when these words were not meaningfully related.  Thus, there 
were no useful cues strengthened in the process of making list-level JOLs, failing the 
precondition of JOL reactivity proposed by the cue-strengthening hypothesis.  The results in 
Experiment 4 were consistent with this prediction.  Additionally, Experiment 4 also showed that 
neither item- nor list-level JOLs enhanced recall for blocked categorical lists.  Here, the former 
finding was consistent with Stevens and Pierce’s (2019) finding whereas the latter was not.  One 
possible reason why the list-level JOL reactivity for blocked lists was not replicated is the 
difference in test format: Stevens and Pierce used cued recall in their experiments, whereas I 
used free recall in the current experiment.  Note that cued recall provided categorical labels as 
test cues compared to free recall, which facilitated relational processing.  Since list-level JOLs 
slant participants toward relational processing, cued recall should be more sensitive to the cues 
strengthened by list-level JOLs compared to free recall.  Therefore, perhaps reactivity of list-
level JOLs may be too subtle to be captured by free recall in the current experiment, but it could 
be picked up by cued recall in Stevens and Pierce’s (2019) experiment.  Of course, other factors 
may also come into play, such as differences in study materials and sample characteristics, which 
will need to be determined by further replications. 
On a related note, another controversy arises from the different findings for reactivity of 
the list-level JOLs between Experiments 3 and 4.  Given that list-level JOLs were found to 
81 
 
improve free recall for target-target related pairs in Experiment 3, it was quite surprising that the 
free recall benefits evaporated when study materials were changed from word pairs to word lists.  
In that connection, it is noteworthy that word pairs should primarily invite participants to process 
the relation between cue and target within each pair, whereas blocked categorical word lists 
should primarily provoke participants to focus on the relations among individual words.  
Therefore, in Experiment 3, list-level JOLs encouraged participants to process the target-target 
relatedness among word pairs, which were cues that were not prioritized by the word pairs per 
se.  However, in Experiment 4, list-level JOLs should produce less improvement in relational 
processing because such processing is already strongly encouraged by blocked categorical lists 
themselves.  In other words, list-level JOLs stimulate complementary processing with target-
target word pairs but not with blocked categorical lists, which may explain why list-level JOLs 
produced positive reactivity in the former situation but not in the latter.   
82 
 
CHAPTER 6 
GENERAL DISCUSSION 
In the present dissertation, I examined the underlying mechanism of JOL reactivity by (a) 
testing the predictions of major theoretical hypotheses about JOL reactivity and by (b) 
identifying which retrieval processes were modified by the solicitation of JOLs.  To achieve 
these aims, I pitted the two leading theoretical accounts, the changed-goal hypothesis (Mitchum 
et al., 2016) and the cue-strengthening hypothesis (Soderstrom et al., 2015), against each other in 
Experiments 1 and 2 and tested further predictions of the cue-strengthening hypothesis in 
Experiments 3.  In Experiment 4, I tested the recently proposed hypothesis that positive 
reactivity of item-level JOLs arises from enhanced item-specific processing (Senkova & Otani, 
2021).  Moreover, I implemented the dual-retrieval model to estimate underlying retrieval 
processes and tested which processes were significantly different between the conditions with 
JOLs and the condition without JOLs.   
Below, I first present a brief review of the experimental design, theoretical predictions, 
and behavioral findings in each of the four experiments.  It turned out that the first three 
experiments offered preferential support for the cue-strengthening hypothesis rather than the 
changed-goal hypothesis, while Experiment 4 provided counterevidence for the item-specific 
hypothesis.  Then, I address what the model analyses across the four experiments reveal about 
the process-level mechanism for JOL reactivity.  Last, I discuss the theoretical implications of 
those findings and the recommendations for future research. 
Summary of Main Methodologies, Hypotheses, and Behavioral Findings 
In Experiment 1, I compared the reactive effects of JOLs between strongly related, 
weakly related, and identical word pairs by having participants either make item-level JOLs or 
83 
 
make no JOLs for a mixed list of the three types of word pairs.  Here, the changed-goal 
hypothesis assumes that making JOLs highlights the differences in learning difficulty among 
items and prompts participants to focus more on learning the least and moderately challenging 
items at the cost of the most difficult items.  Thus, it predicts negative reactivity for weakly 
related pairs and positive reactivity for strongly related and identical pairs.  However, the cue-
strengthening hypothesis predicts positive reactivity for all three types of word pairs, because it 
assumes that making JOLs can enhance the processing of cues that inform JOLs (i.e., cue-target 
relation and cue-target identity in this scenario) and positive reactivity should arise if the 
strengthened cues are useful in subsequent memory tests.  The results of Experiment 1 showed 
that associative recall was better in the item-JOL condition than in the no-JOL condition, and the 
effect of JOL condition was not moderated by word pair type.  In other words, JOLs improved 
recall performance for strongly related, weakly related, and identical word pairs to a similar 
extent.  Obviously, this result is in line with the cue-strengthening hypothesis but not the 
changed-goal hypothesis.   
Experiment 2 was meant to test the contrasting predictions about the reactive effect of 
prestudy JOLs between the changed-goal and cue-strengthening hypotheses.  Unlike immediate 
JOLs, which are made after studying each item, prestudy JOLs are made before studying each 
item but with specific information provided for the coming item.  In Experiment 2, participants 
were told whether they were going to study a related or unrelated pair when making prestudy 
JOLs.  Like immediate JOLs, prestudy JOLs were significantly higher for related pairs than for 
unrelated pairs (Mueller et al., 2013, 2016), suggesting that participants were aware that related 
pairs are more memorable than unrelated pairs when making prestudy JOLs.  Thus, the changed-
goal hypothesis predicts similar reactivity between prestudy JOLs and immediate JOLs.  That is, 
84 
 
making prestudy JOLs should similarly change participants’ learning goals and motivate them to 
focus more on learning related pairs at the expense of unrelated pairs, which ultimately produce 
negative reactivity for unrelated pairs but positive reactivity for related pairs.   
However, the cue-strengthening hypothesis predicts either no or very weak reactivity of 
prestudy JOLs.  The reason is that prestudy JOLs were formed based on very limited intrinsic 
and extrinsic cues.  As an illustration, in Experiment 2, participants received a homogeneous 
prompt for all related pairs that “you are going to study a related pair”, which provided no item-
specific cues that can help them to recall the particular target for a given cue on the later test.  
Moreover, participants may use various encoding strategies when studying the word pairs, such 
as interactive imagery (Wilton, 2006) or verbal elaboration (Jensen & Rohwer, 1963).  Although 
participants might have a global sense of what strategies they would use when making prestudy 
JOLs, the strategy implemented for a specific pair would not be accessible until the pair was 
encoded.  Even worse, prestudy JOLs could not possibly be based on mnemonic cues, because 
those cues are embedded in the encoding experience itself, such as feeling of fluency or 
familiarity.  Thus, the cue-strengthening hypothesis predicts that because prestudy JOLs are 
made based on fewer diagnostic cues, prestudy JOLs should have either no reactivity or much 
weaker reactivity than immediate JOLs.  Again, my results were consistent with the cue-
strengthening hypothesis instead of the changed-goal hypothesis, as prestudy JOLs produced no 
reactivity for both related and unrelated pairs while immediate JOLs produced positive reactivity 
for related pairs but no reactivity for unrelated pairs. 
Experiment 3 targeted the cue-strengthening hypothesis.  Unlike Experiment 1 and 2, in 
Experiment 3, semantic relation was not manipulated within pairs but between pairs.  Namely, 
there was no relatedness between cue and target within each pair, but there was either categorical 
85 
 
relation or no relation between the target words among consecutive pairs.  Meanwhile, I solicited 
both item-level JOLs and list-level JOLs in comparison to the no-JOL control condition.  The 
former was made after each word pair, but the latter was made after each list of four word pairs.  
Last, I administered either associative or free recall tests.  Based on the cue-strengthening 
hypothesis, JOL reactivity only arises when (a) JOLs are capable of strengthening the cues 
embedded in the study materials, and (b) when the final test is sensitive to the cues that are 
strengthened by JOLs.  Accordingly, item-level JOLs should produce negative or little-to-no 
reactivity for target-target related pairs in free recall because they primarily enhance the 
processing of within-pair relation rather than inter-pair relation, whereas free recall is more 
sensitive to the latter than the former (Mitchum et al., 2016; Myers et al., 2020).  Item-level JOLs 
should also produce little-to-no reactivity for target-target related pairs in associative recall, as 
there was no cue-target relatedness, which associative recall favors, in those pairs.  However, 
list-level JOLs should produce positive reactivity for target-target related pairs in free recall but 
not in associative recall, as they primarily strengthen processing of target-target relatedness, 
while only free recall is sensitive to such cues.  Again, my results are consistent with these 
predictions: List-level JOLs improved free recall but not associative recall for target-target 
related pairs, and item-level JOLs had no effects on either associative or free recall for target-
target related pairs. 
In Experiment 4, I tested the predictions of a recently proposed item-specific hypothesis 
(Senkova & Otani, 2021) and revisited the contradictory finding of Senkova and Otani (2021) 
versus Stevens and Pierce (2019).  On the one hand, Senkova and Otani (2021) found positive 
reactivity of item-level JOLs on recall for randomized categorical lists.  On the other hand, 
Stevens and Pierce (2019) found no reactivity of item-level JOL on recall for blocked categorical 
86 
 
lists, but they found positive reactivity of list-level JOLs.  In Experiment 4, I again administered 
the three JOL conditions used in Experiment 3: item-JOL, list-JOL, and no-JOL conditions.  
Meanwhile, I used categorical lists as study materials and presented them in either a randomized 
format or a blocked format.  If reactivity of item-level JOLs results from the enhanced item-
specific processing, which is complementary to the relational processing naturally provoked by 
categorical lists, item-level JOLs should improve recall for both randomized and blocked 
categorical lists.  However, Experiment 4 replicated both Senkova and Otani’s and Stevens and 
Pierce’s findings in that positive reactivity of item-level JOLs occurred only for randomized but 
not for blocked categorical lists, which does not agree with the prediction of the item-specific 
hypothesis.  Additionally, I did not replicate Stevens and Pierce’s finding that list-level JOLs 
improve recall for blocked categorical lists, which is possibly due to the differences in the 
memory test format, since I used free recall tests whereas Stevens and Pierce used cued recall 
tests (i.e., free recall tests with categorical labels presented as test cues).   
A summary of the experiment designs and main findings for recall in Experiments 1-4 
were presented in Table 6.1.  For a summary of the main theoretical predictions in Experiments 
1-4, readers can refer back to Table 1.3.  A comparison between Table 1.3 and Table 6.1 reveals 
that Experiments 1, 2, and 3 provide converging support for the cue-strengthening hypothesis 
rather than for the changed-goal hypothesis.  Additionally, Experiment 4 did not lend support to 
the item-specific hypothesis. 
  
87 
 
Table 6.1 
A Summary of the Experiment Designs and Recall Findings for Experiments 1-4 
Exps Experiment Design Main Recall Findings 
1 3 (Pair type: weakly related, Identical pair: Item-JOL > No-JOL 
strongly related, identical)  2 (JOL Strong pair: Item-JOL > No-JOL 
condition: item-JOL, no-JOL).   Weak pair: Item-JOL > No-JOL. 
 
2 2 (Pair type: related, unrelated)  3 Related pairs: Immediate-JOL > Prestudy-JOL 
(JOL condition: prestudy-JOL, = No-JOL  
immediate-JOL, no-JOL) Unrelated pairs: Immediate-JOL = Prestudy-
JOL = No-JOL 
 
3 2 (Target-target relation: related, Target-target related pairs: 
unrelated)  3 (JOL condition: - Free recall: List-JOL > Item-JOL = No-
item-JOL, list-JOL, no-JOL) JOL 
- Associative recall: List-JOL = Item-JOL 
= No-JOL 
Target-target unrelated pairs: 
- Free recall: List-JOL = Item-JOL = No-
JOL 
- Associative recall: List-JOL = Item-JOL 
= No-JOL 
 
4 2 (List organization: blocked, Blocked categorical lists: Item-JOL = List-JOL 
randomized)  3 (JOL condition: = No-JOL 
item-JOL, list-JOL, no-JOL) Randomized categorical lists: Item-JOL > List-
JOL = No-JOL 
Note.  Exps = Experiments.  JOL condition was manipulated between subjects throughout all 
four experiments.  The other variables were manipulated within subjects except for list 
organization in Experiment 4.  Item-JOL, immediate-JOL, prestudy-JOL, list-JOL, no-JOL all 
refer to the corresponding JOL conditions.  “=” means statistically equivalent recall (i.e., no 
reactivity), and “>” means significantly better recall (i.e., positive reactivity). 
 
Process-Level Mechanisms for JOL Reactivity 
The implementation of the dual-retrieval model allowed me to determine which retrieval 
processes are responsible for JOL reactivity.  As a reminder, the dual-retrieval model delivers 
estimates for two clusters of retrieval parameters: One is concerned with verbatim-based 
recollection, including the direct access (D) parameters and the forgetting (F) parameters.  The 
88 
 
former represents the probability that the prior presentation of a specific item is vividly reinstated 
in mind so verbatim details of the item can be directly accessed.  The latter represents the 
probability of losing direct access to verbatim details due to forgetting after the first recall test.  
The other cluster is concerned with gist-based non-recollective operations, which includes the 
reconstructive (R) parameter and the familiarity judgment (J) parameters.  The former represents 
the probability of reconstructing an item based on partial information when recollection is not 
possible, and the latter represents the probability that the reconstructed item passes a familiarity 
threshold and is successfully outputted. 
A summary of the dual-retrieval model results for Experiments 1-4 is presented in Table 
6.2.  An inspection of Table 6.2 reveals that there are both commonalities and differences in the 
dual-retrieval model results among the four experiments.  On the one hand, despite changes in 
the study materials, JOL type, or test format across the four experiments, positive JOL reactivity 
was always accompanied by increases in the D parameter.  In Experiment 1, positive JOL 
reactivity was accompanied by an increase in the D parameter as well as a reduction in the F 
parameter for strongly related, weakly related, and identical word pairs.  Additionally, there was 
an increase in the J parameters for weakly related pairs and an increase in the R parameter for 
identical pairs.  In Experiment 2, immediate JOLs produced positive reactivity for related pairs, 
which was located specifically in the D parameter.  In Experiment 3, list-level JOLs produced 
positive reactivity for target-target related pairs in free recall, which was the product of increases 
in the D, R, and J parameters.  In Experiment 4, enhancements in the D and J parameters jointly 
contributed to the positive reactivity of item-level JOLs for randomized categorical lists.  To sum 
up, it seems that enhanced direct access to verbatim details is a stable contributor to positive JOL 
reactivity.  Namely, when the solicitation of JOLs boosted subsequent memory performance, it 
89 
 
was always tied to participants being better at recollecting the surface details of the prior 
presentation of studied items. 
Table 6.2 
A Summary of the Dual-Retrieval Model Findings for Experiments 1-4 
Exps Main Dual-Retrieval Model Findings 
1 Identical pairs Weakly related pairs Strongly related pairs 
D: Item-JOL > No-JOL  D: Item-JOL > No-JOL  D: Item-JOL > No-JOL  
F: Item-JOL < No-JOL  F: Item-JOL < No-JOL  F: Item-JOL < No-JOL  
R: Item-JOL > No-JOL  J3: Item-JOL > No-JOL   
 
2 Related pairs Unrelated pairs 
D: Immediate-JOL > Prestudy-JOL = No- D: Immediate-JOL = No-JOL > Prestudy-
JOL JOL 
F: Prestudy-JOL > Immediate-JOL = No- F: Prestudy-JOL > Immediate-JOL = No-
JOL JOL 
 
3 Target-target related pairs Target-target unrelated pairs 
Associative recall: Associative recall: 
F: Item-JOL > List-JOL D: Item-JOL > List-JOL = No-JOL  
J2: Item-JOL > No-JOL = List-JOL F: Item-JOL > List-JOL  
J3: Item-JOL > No-JOL = List-JOL R: No-JOL > List-JOL > Item-JOL 
 J1: No-JOL > List-JOL 
Free recall: J2: Item-JOL = No-JOL > List-JOL 
D: List-JOL > No-JOL > Item-JOL  J3: Item-JOL = No-JOL > List-JOL 
R: List-JOL = Item-JOL > No-JOL  
J2: Item-JOL > List-JOL > No-JOL Free recall: 
J3: Item-JOL = List-JOL > No-JOL Null finding 
 
4 Blocked categorical lists Randomized categorical lists 
F: List-JOL > Item-JOL = No-JOL D: Item-JOL ≥ No-JOL > List-JOL  
J2: List-JOL > Item-JOL J1: List-JOL > Item-JOL > No-JOL 
Note.  Exps = Experiments.  Item-JOL, immediate-JOL, prestudy-JOL, list-JOL, no-JOL all 
refer to the corresponding JOL conditions.  D = direct access parameter; F = forgetting 
parameter; J1 = familiarity judgment parameter for test 1; J2 = familiarity judgment parameter 
for test 2; J3 = familiarity judgment parameter for test 3; R = reconstruction parameter.  “=” 
means statistically equivalent, “>” means significantly higher, “≥” means marginally higher, 
and “<” means significantly lower.  Parameters whose variations accompanied positive JOL 
reactivity at the behavior level were highlighted with boldface fonts. 
 
90 
 
On the other hand, there was much evidence suggesting that the process-level patterns for 
JOL reactivity varied with the material type, JOL type, and test format.  To illustrate, in terms of 
material type, it can be seen that immediate JOLs only enhanced the D parameter for related 
pairs but not for unrelated pairs in Experiment 2, and list-level JOLs only enhanced the D 
parameter for target-target related pairs but not for target-target unrelated pairs in the free recall 
tests of Experiment 3.  Moreover, when there was salient semantic information to process within 
each study item (such as the strongly related pairs in Experiment 1 and the related pairs in 
Experiment 2), positive JOL reactivity was consistently located on the verbatim-based 
recollective parameter (the D and/or F parameters).  However, when there was salient semantic 
relation between individual study items (such as the target-target related pairs in Experiment 3 
and the randomized categorical lists in Experiment 4), positive reactivity was located in both 
verbatim-based recollective parameters and gist-based non-recollective parameters (the R and/or 
J parameters).   
In terms of JOL type, there were quite different process-level patterns between immediate 
and prestudy JOLs and between item-level and list-level JOLs.  In Experiment 2, for related 
word pairs, immediate JOLs improved the D parameter and did not affect the F parameter, 
whereas prestudy JOLs did not affect the D parameter but increased the F parameters.  
Meanwhile, immediate JOLs had no effects on the D or F parameter for unrelated word pairs, but 
prestudy JOLs impaired the D parameter and increased the F parameter.  In Experiment 3, list-
level JOLs enhanced the D parameter for target-target related pairs in free recall, but item-level 
JOLs impaired it.  However, in Experiment 4, list-level JOLs impaired the D parameters for 
randomized categorical lists in free recall, while item-level JOLs improved it.  Thus, with the 
91 
 
same materials and test format, different types of JOLs produced different effects on the 
underlying retrieval processes.   
In terms of test format, in Experiment 3, list-level JOLs enhanced the D, R, and J 
parameters for target-target related pairs in free recall, but it did not affect any parameter for the 
same materials in associative recall.  On the contrary, list-level JOLs undermined the J parameter 
for target-target unrelated pairs in associative recall, but it had no effect on any parameter for the 
same materials in free recall.  Hence, not only the behavioral patterns but also the underlying 
process-level mechanism showed that JOL reactivity varied as a function of test format. 
In summary, the dual-retrieval models revealed that positive JOL reactivity was 
consistently accompanied by an increase in the D parameter, showing that the memory benefits 
of JOLs are partially attributed to the better recollection of item-specific verbatim details.  
Meanwhile, the effects of JOLs on underlying retrieval processes varied with the material type, 
JOL type, and test format, suggesting that JOL reactivity is flexible in adapting to the specific 
learning situation.  Here, although the D parameter results support Senkova and Otani’s (2021) 
item-specific hypothesis, the latter findings suggest that this hypothesis cannot fully account for 
JOL reactivity.  Specifically, the effects of JOLs on subsequent memory are not restricted to 
enhancing item-specific processing, although this is a consistent contributing factor.  Rather, 
what types of cues JOLs draw upon and strengthen depends on the specific type of processing 
that is stimulated by the study materials and the particular types of cues that JOLs slant 
processing toward.  Whether the strengthened cues would eventually lead to positive reactivity 
depends on whether they overlap with the types of cues that memory tests are sensitive to.  Thus, 
the model findings are consistent with the cue-strengthening hypothesis, which is discussed in 
more detail below. 
92 
 
Theoretical Implications and Future Directions 
A Contextual Framework for Understanding JOL Reactivity 
Thus far, JOL reactivity has been established in many experiments (Dougherty et al., 
2005; Janes et al., 2018; Mitchum et al., 2016; Myers et al., 2020; Rivers et al., 2021; Senkova & 
Otani, 2021; Soderstrom et al., 2015; Tauber & Witherby, 2019; Tekin & Roediger, 2020; 
Witherby & Tauber, 2017b; Yang et al., 2015; Zechmeister & Shaughnessy, 1980; Zhao et al., 
2021).  Still, some other experiments failed to find the effect (Ariel et al., 2021; Benjamin et al., 
1998; Dougherty et al., 2018; Kelemen & Weaver III, 1997; Kornell & Bjork, 2008; Tauber & 
Rhodes, 2012).  Therefore, an obvious theoretical goal is to develop a coherent explanation of 
JOL reactivity that specifies when it will be present and when it will be absent.   
In that connection, it would be beneficial to adopt a contextual framework for 
understanding JOL reactivity.  In Jenkins's (1979) tetrahedral model of memory experiments, 
memory performance is considered as a contextual phenomenon based on four clusters of 
variables: subject characteristics (e.g., ability, interest), encoding tasks (e.g., directions or 
instructions provided at encoding), study materials (e.g., type of to-be-remember materials), and 
criterial tests (e.g., recall, recognition; see also McDaniel & Butler, 2011; Roediger, 2008).  
Additionally, the tetrahedral model assumes that these variables interact with each other.  
Specifically, the model envisions the four clusters of variables as four corners of a tetrahedron.  
Thus, an edge between two corners represents a two-way interaction between the two variables, 
and a face of the tetrahedron represents a three-way interaction among three variables.   
It should be noted that the cue-strengthening hypothesis highlights the interactions 
between three dimensions in the tetrahedral model: study material, encoding task (i.e., JOL), and 
criterial test, as the hypothesis assumes that JOL reactivity only occurs when JOLs strengthen the 
93 
 
cues embedded in the study materials, and the subsequent memory tests are sensitive to the 
strengthened cues.  Myers et al.’s (2020) experiments featured the interaction between two 
dimensions in the tetrahedral model: study materials and criterial test.  Based on their finding 
that item-level JOLs enhanced associative recall but not free recall for related pairs, they made 
an inference that is consistent with the cue-strengthening hypothesis: “the direction and strength 
of JOL reactivity depend on both the study material and type of final test” (p. 755).   
The behavioral and model findings in the current experiments were consistent with Myers 
et al.’s notion, as I discussed in the last section.  Moreover, in Experiments 2 and 3, my results 
showed that reactive effects are very different for prestudy and immediate JOLs and for item-
level and list-level JOLs.  Thus, my results extended Myers et al.’s inference in that JOL 
reactivity also depended on the third factor: JOL type, which fits into the “encoding task” 
dimension in the tetrahedral model.  Therefore, consistent with the cue-strengthening hypothesis, 
a three-way interaction between study materials, encoding task, and criterial test was detected, 
which indicates that the overlap between the cues that are embedded in the study materials, the 
cues that inform JOLs, and the cues that are used in the final memory test serve as a key 
determinant of JOL reactivity.   
It can be seen that the tetrahedral model is a promising framework for extending the cue-
strengthening hypothesis and for further understanding JOL reactivity.  Future research should 
benefit from exploring the other dimensions in the tetrahedral model as well as the interactions 
among those dimensions.  For instance, future studies can investigate how subject characteristics 
affect JOL reactivity and how those variables interact with variables in other dimensions.  In that 
regard, some studies have investigated the developmental trend in JOL reactivity.  Tauber and 
Witherby (2019) found that although there was consistent positive JOL reactivity on younger 
94 
 
adults’ associative recall for related word pairs, older adults’ recall was not affected by making 
JOLs.  Meanwhile, Zhao et al. (2021) found that making JOLs enhanced younger and older 
children’s recognition for word lists just as for young adults, and the magnitude of positive JOL 
reactivity increased with age.  Therefore, JOL reactivity seems to vary as a function of age.  
However, Zhao et al. manipulated JOL solicitation within participants, whereas Tauber and 
Witherby manipulated it between subjects.  Thus, it would be worth examining whether there 
would be positive JOL reactivity in older adults when the solicitation of JOLs is manipulated 
within subjects.  Interestingly, Zhao et al. (2021) also reported considerable individual 
differences in JOL reactivity.  When they decomposed the reactive effects of JOLs at the 
individual level, they found that although the majority of children experienced positive 
reactivity, there was a substantial proportion who did not.  Thus, it would be interesting to 
incorporate individual-level analyses in future JOL reactivity research and investigate what 
factors predict individual differences in JOL reactivity. 
By the same token, it will also be beneficial to examine JOL reactivity with other types of 
study materials.  In the current study, I used relatively simple study materials such as word pairs 
and word lists.  Thus, one recommendation for future research is to examine JOL reactivity with 
more complex study materials, such as pictures, sentences, or text materials.  In terms of JOL 
reactivity with pictures, the only study I am aware of is Sommer et al. (1995), who reported that 
making JOLs improved recognition for face images.  In terms of text materials, Ariel et al. 
(2021) recently reported that making aggregate or term-specific JOLs for a piece of science text 
did not enhance participants’ performance on later short-answer questions unless overt retrieval 
was prompted before JOLs.  Here, aggregate JOLs were solicited after reading a complete piece 
of science text (e.g., how confident are you that you understand the text), while term-specific 
95 
 
JOLs were solicited after reading a subsection of the science text that is specifically devoted to a 
single concept (e.g., “how confident are you that you understand how minerals are made”; p. 
700).   
Why did JOLs produce robust positive reactivity for related word pairs and word lists, 
but not for longer text materials?  Ariel et al. proposed that it is because JOLs prompt different 
retrieval dynamics with the former than the latter.  Namely, with more complex materials such as 
science text, making JOLs may elicit less effortful retrieval and earlier termination than for 
simpler materials such as word pairs and word lists. In support of the retrieval dynamic proposal, 
Ariel et al. found that if participants were asked short-answer questions about the text as a 
retrieval prompt before making a JOL, positive JOL reactivity emerged again.  It is also worth 
noting that JOLs were studied at the global level with text materials, as they were solicited after 
at least a cluster of words.  On a related note, the present dissertation showed that list-level JOLs, 
which are also a type of global-level JOLs, did not display stable reactivity: They produced 
positive reactivity on free recall for target-target related pairs in Experiment 3 but not for 
blocked categorical lists in Experiment 4.  Thus, it is possible that reactivity of JOLs solicited at 
the global level was, in general, less robust than that of JOLs solicited at the local level.  In brief, 
the causes of the volatility of global-level JOLs for complex materials remain an open question 
and merits further investigation.   
In sum, it is recommended that future JOL reactivity research adopt a contextual 
framework that is based on the tetrahedral model of memory experiments, which serves as a 
scaffolding for extending the cue-strengthening hypothesis.  Accordingly, researchers are 
encouraged to explore the less studied dimensions and the interactions among different 
96 
 
dimensions in the tetrahedral model, such as investigating how JOL reactivity varies with 
individual difference variables and the complexity of study materials. 
Implications for Research on Other Encoding Tasks  
In Chapter 1, I have discussed that the literature on various encoding tasks provides 
critical implications for JOL reactivity.  In turn, the present findings on JOL reactivity also 
provide implications for research on the other encoding tasks.  First and most obvious, 
researchers should consider JOLs as an independent encoding task rather than a pure 
metamemory measurement.  As evident in Soderstrom et al. (2015) and Tekin and Roediger 
(2020), the solicitation of JOLs substantially attenuated the generation effect and the depth-of-
processing effect.  Namely, making JOLs decreased the memory difference between generated 
versus read items and between deeply processed versus shallowly processed items.  Thus, when 
both JOLs and another encoding task are implemented in an experiment design, researchers 
should always consider the possibility that the memory effects of the encoding task of interest 
may be moderated by JOLs.  
Second, the overlaps in the surface format, memory effects, and theoretical explanations 
between JOLs and other encoding tasks suggest they may be considered in a unified theoretical 
framework.  JOLs and other common encoding tasks, such as deep processing or survival 
processing, all require participants to make judgments about study materials during encoding, 
which then produced reliable effects on memory performance.  Meanwhile, the cue-
strengthening hypothesis partly builds upon the transfer-appropriate multifactor account of the 
generation effect, which indicates that the effects of both JOLs and generation are assumed to be 
subject to the principle of transfer-appropriateness.  Considering the resemblance between JOLs 
and the other common encoding tasks, the research on other encoding tasks should also benefit 
97 
 
from adopting a contextual framework that emphasizes the interactions among subject 
characteristics, study materials, encoding tasks, and criterial tests.   
Taking the deep processing task as an illustration, evidence has shown that the level-of-
processing effect was constrained by test format and material type.  Morris et al. (1977) showed 
that although deep processing task (semantic-oriented) produced superior memory compared to 
shallow processing task (phonetic-oriented) with a standard recognition test, the pattern was 
reversed with a rhyming recognition test.  On the rhyming recognition test, the to-be-
remembered words were not the original study words but words that rhymed with the study 
words.  In this case, Morris et al. found that the shallow processing task led to better performance 
than the deep processing task, suggesting that the depth-of-processing effect depends on the 
match between encoding task and test format.   
Moreover, deep processing was demonstrated to increase recall of critical distractors 
(e.g., sleep) for semantic Deese-Roediger-McDermott (DRM) lists (e.g., a list of words that are 
forward associates of “sleep”, such as bed, doze, awake, nap, yawn, …) compared to shallowing 
processing (Thapar & McDermott, 2001; Toglia et al., 1999).  However, Chan et al. (2005) 
found that shallowing processing produced higher recall of critical distractors for phonological 
DRM lists (e.g., a list of words that sound like “sleep”, such as sweep, steep, sleet, slop, 
heap, …) than deep processing.  This indicates that the depth-of-processing effect was also 
subject to the interaction between encoding task and material type. 
Similarly, the survival processing effect has been found to vary as a function of material 
type and test format.  Butler et al. (2009) used three types of word lists: a list that is relevant to a 
grassland survival scenario, a list that is relevant to a bank robbery scenario, and a list that is 
irrelevant to both scenarios.  When participants studied the three word lists, they were asked to 
98 
 
rate the words’ relevance to either the grassland survival or the bank robbery scenario.  Here, 
Butler et al. found an interaction between list type and rating instruction: The recall performance 
for irrelevant lists was comparable between the two rating conditions, whereas survival rating led 
to better recall for survival-relevant lists, and robbery rating led to better recall for robbery-
related lists.  This suggests that the memory benefits of survival processing rely on the congruity 
between the content of study materials and the encoding task (but see Nairne & Pandeirada, 
2011).  Meanwhile, Broder et al. (2011) also found that the memory benefits of survival 
processing did not extend from item memory test to source memory test, demonstrating that test 
format is a boundary condition for the survival processing effect.  
To sum up, researchers should always consider the likelihood of interactions between 
JOLs and other encoding tasks, because JOLs are themselves an independent encoding task.  
More important, it can be seen that JOLs shared a close resemblance with common encoding 
tasks such as deep processing and survival processing.  In addition to the surface similarity that 
they all solicit judgments about study materials during encoding, their memory effects all vary 
with material type, encoding task, and test format.  Thus, it is recommended that researchers 
should adopt a contextual framework to investigate the memory effects of both JOLs and other 
similar encoding tasks.  
Questions That Remain to Be Answered 
One important question that merits further consideration for the cue-strengthening 
hypothesis is under what conditions JOLs strengthen diagnostic cues.  Recall that the cue-
strengthening hypothesis is formed based on the combination of the cue-utilization framework 
for JOLs (Koriat, 1997) and the transfer-appropriate multifactor account for generation effect (de 
Winstanley et al., 1996).  Here, the cue-utilization framework stresses that JOLs are made based 
99 
 
on three types of cues: intrinsic, extrinsic, and mnemonic cues.  The transfer-appropriate 
multifactor account posits that “the act of generation strengthens whatever type of information is 
used by the learner to complete the generation task” (p. 554; Soderstrom et al., 2015).  Therefore, 
the cue-strengthening hypothesis seems to assume that JOLs would strengthen whatever cues 
that are used in forming the JOLs.   
However, my results suggest that this is not necessarily the case.  For instance, in 
Experiment 3, item-level JOLs were higher for target-target related pairs than for target-target 
unrelated pairs, suggesting that inter-pair relations were processed when making item-level 
JOLs.  However, there was no reactivity for item-level JOLs even when the later free recall test 
was sensitive to inter-pair relations, which suggests that item-level JOLs did not strengthen 
processing for inter-pair relations, at least not in a statistically detectable manner.  Such results 
cast doubt on the assumption that whatever cues that are used in JOLs would be strengthened by 
the act of making JOLs.  In that connection, Soderstrom et al. (2015) suggested that cue salience 
may be a precondition for cue strengthening.  That is, only cues that are easily discernable (e.g., 
cue-target relation in strongly related pairs) would be efficiently used in forming JOLs and thus 
be strengthened, which explains why reactivity was much weaker for weakly related or unrelated 
pairs.  However, such an account would have difficulty explaining why positive reactivity arises 
for weakly related pairs in Experiment 1 of the present dissertation and in Tauber and Witherby 
(2019; Experiments 3, 4, & 5).  Notably, the magnitude of positive reactivity was comparable 
between strongly related and weakly related pairs in the former, and it was even numerically 
larger for weakly related than for strongly related pairs in the latter.  Therefore, it would require 
further research to specify what constitutes the exact conditions for JOLs to strengthen the cues 
embedded in study materials.   
100 
 
Another question that remains to be answered is whether JOL reactivity stems from 
incidental improvement in learning processes or intentional, strategic responses to the demand 
for self-assessment (Double et al., 2018).  As Double et al. discussed, although JOLs are only 
intended to tap retrospective evaluation for items that are just studied, they can also prime 
prospective evaluation for to-be-studied items as they are repeatedly solicited throughout the 
learning process.  Thus, it is possible that a JOL made for the prior item can provide feedback for 
adjusting study strategies for the next items.  However, my results in Experiment 2 did not 
support this speculation, as prestudy JOLs, which should prompt prospective evaluation and 
motivate participants to update their learning strategies, did not induce positive reactivity.  
Nevertheless, this result does not necessarily rule out the possibility that strategic responses are 
involved in JOL reactivity.  To further investigate this issue, self-report measures about 
metacognitive experiences could be administered (e.g., Mitchum et al., 2016; Rivers et al., 
2021).  For example, researchers can ask participants to report whether they engage in 
consciously different processing strategies between items that are followed by a JOL versus 
those that are not followed by a JOL.   
The last question is whether JOL reactivity varies as a function of JOL accuracy.  That is, 
if people make more accurate JOLs, would JOLs directly enhance memory to a larger extent?  
Here, Double (2019) examined a mirrored version of the question: Whether JOLs impair 
memory when JOLs were less accurate.  He manipulated the font sizes of a pure list of related 
pairs and a pure list of unrelated pairs.  Font size has been widely studied as a cue that induces a 
dissociation between JOLs and actual memory, where larger font sizes reliably increase JOLs but 
do not necessarily improve memory (Chang & Brainerd, 2022).  Thus, Double expected that 
when people’s attention was captured by font size, which is a salient but not diagnostic cue, they 
101 
 
may base JOLs primarily on font size rather than other less salient but diagnostic cues.  
Consequently, making JOLs strengthened the processing of uninformative cues, which should 
thus impair rather than benefit future recall.  His findings were consistent with his hypothesis.  
However, such findings still wait to be replicated.  Meanwhile, it would be informative to 
investigate whether improving JOL accuracy, such as by providing metacognitive training, 
would enhance JOL reactivity. 
Concluding Comments 
The behavioral findings in the present dissertation were more consistent with the 
predictions of the cue-strengthening hypothesis than of the changed-goal hypothesis, thus 
offering preferential support for the former.  Moreover, the dual-retrieval model results 
demonstrated that although an enhanced recollection was a hallmark of positive JOL reactivity, it 
was not the sole component, as JOLs also enhanced non-recollective operations when interitem 
relations rather than item-specific cues were featured in study materials.  Further, the process-
level pattern of JOL reactivity depended heavily on the overlap in cues between study materials, 
JOL tasks, and memory tests.  Thus, it is recommended that future studies adopt a contextual 
framework for understanding JOL reactivity. 
  
102 
 
Reference 
Arbuckle, T. Y., & Cuddy, L. L. (1969). Discrimination of item strength at time of presentation. 
Journal of Experimental Psychology, 81(1), 126–131. https://doi.org/10.1037/h0027455 
Ariel, R., Dunlosky, J., & Bailey, H. (2009). Agenda-based regulation of study-time allocation: 
When agendas override item-based monitoring. Journal of Experimental Psychology: 
General, 138(3), 432–447. https://doi.org/10.1037/a0015928 
Ariel, R., Karpicke, J. D., Witherby, A. E., & Tauber, S. K. (2021). Do judgments of learning 
directly enhance learning of educational materials? Educational Psychology Review, 
33(2), 693–712. https://doi.org/10.1007/s10648-020-09556-8 
Benjamin, A. S., Bjork, R., & Schwartz, B. (1998). The mismeasure of memory: When retrieval 
fluency is misleading as a metamnemonic index. Journal of Experimental Psychology: 
General, 127(1), 55–68. https://doi.org/10.1037//0096-3445.127.1.55 
Besken, M., & Mulligan, N. W. (2013). Easily perceived, easily remembered? Perceptual 
interference produces a double dissociation between metamemory and memory 
performance. Memory & Cognition, 41(6), 897–903. https://doi.org/10.3758/s13421-013-
0307-8 
Bowen, H. J., Gallant, S. N., & Moon, D. H. (2020). Influence of reward motivation on directed 
forgetting in younger and older adults. Frontiers in Psychology, 11, 1764. 
https://doi.org/10.3389/fpsyg.2020.01764 
Bower, G. H., Martin, & Karlin, B. (1974). Depth of processing pictures of faces and recognition 
memory. Journal of Experimental Psychology, 103(4), 751–757. 
https://doi.org/10.1037/h0037190 
 
103 
 
Brainerd, C. J., & Reyna, V. F. (1998). Fuzzy-trace theory and children’s false memories. 
Journal of Experimental Child Psychology, 71(2), 81–129. 
https://doi.org/10.1006/jecp.1998.2464 
Brainerd, C. J., & Reyna, V. F. (2010). Recollective and nonrecollective recall. Journal of 
Memory and Language, 63(3), 425–445. https://doi.org/10.1016/j.jml.2010.05.002 
Brainerd, C. J., Reyna, V. F., & Howe, M. L. (2009). Trichotomous processes in early memory 
development, aging, and neurocognitive impairment: A unified theory. Psychological 
Review, 116(4), 783–832. https://doi.org/10.1037/a0016963 
Brainerd, C. J., Wright, R., Reyna, V. F., & Payne, D. G. (2002). Dual-retrieval processes in free 
and associative recall. Journal of Memory and Language, 46(1), 120–152. 
https://doi.org/10.1006/jmla.2001.2796 
Broder, A., Krüger, N., & Schütte, S. (2011). The survival processing memory effect should 
generalise to source memory, but It doesn’t. Psychology, 2(9), 896-901. 
https://doi.org/10.4236/psych.2011.29135 
Butler, A. C., Kang, S. H. K., & Roediger, H. L. (2009). Congruity effects between materials and 
processing tasks in the survival processing paradigm. Journal of Experimental 
Psychology: Learning, Memory, and Cognition, 35(6), 1477–1486. 
https://doi.org/10.1037/a0017024 
Castel, A. D. (2008). Metacognition and learning about primacy and recency effects in free 
recall: The utilization of intrinsic and extrinsic cues when making judgments of learning. 
Memory & Cognition, 36(2), 429–437. https://doi.org/10.3758/MC.36.2.429 
Castel, A. D., McCabe, D. P., & Roediger, H. L. (2007). Illusions of competence and 
overestimation of associative memory for identical items: Evidence from judgments of 
104 
 
learning. Psychonomic Bulletin & Review, 14(1), 107–111. 
https://doi.org/10.3758/BF03194036 
Chan, J. C. K., McDermott, K. B., Watson, J. M., & Gallo, D. A. (2005). The importance of 
material-processing interactions in inducing false memories. Memory & Cognition, 33(3), 
389–395. https://doi.org/10.3758/BF03193057 
Chang, M. (2019). Dual-retrieval models and metamemory in younger and older adults 
[Unpublished master’s thesis, Cornell University]. 
https://ecommons.cornell.edu/handle/1813/70006 
Chang, M., & Brainerd, C. J. (2022). Association and dissociation between judgments of 
learning and memory: A Meta-analysis of the font size effect. Metacognition and 
Learning. https://doi.org/10.1007/s11409-021-09287-3 
Coltheart, M. (1981). The MRC Psycholinguistic Database. The Quarterly Journal of 
Experimental Psychology Section A, 33(4), 497–505. 
https://doi.org/10.1080/14640748108400805 
Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory 
research. Journal of Verbal Learning and Verbal Behavior, 11(6), 671–684. 
https://doi.org/10.1016/S0022-5371(72)80001-X 
Craik, F. I. M., & Tulving, E. (1975). Depth of processing and the retention of words in episodic 
memory. Journal of Experimental Psychology: General, 104(3), 268–294. 
https://doi.org/10.1037/0096-3445.104.3.268 
de Winstanley, P. A., Bjork, E. L., & Bjork, R. A. (1996). Generation effects and the lack 
thereof: The role of transfer-appropriate processing. Memory, 4(1), 31–48. 
https://doi.org/10.1080/741940667 
105 
 
Double, K. S. (2019). Do judgments of learning impair recall when uninformative cues are 
salient? PsyArXiv. https://doi.org/10.31234/osf.io/a5bxw 
Double, K. S., & Birney, D. P. (2019). Reactivity to measures of metacognition. Frontiers in 
Psychology, 10, 2755. https://doi.org/10.3389/fpsyg.2019.02755 
Double, K. S., Birney, D. P., & Walker, S. A. (2018). A meta-analysis and systematic review of 
reactivity to judgements of learning. Memory, 26(6), 741–750. 
https://doi.org/10.1080/09658211.2017.1404111 
Dougherty, M. R., Robey, A. M., & Buttaccio, D. (2018). Do metacognitive judgments alter 
memory performance beyond the benefits of retrieval practice? A comment on and 
replication attempt of Dougherty, Scheck, Nelson, and Narens (2005). Memory & 
Cognition, 46(4), 558–565. https://doi.org/10.3758/s13421-018-0791-y 
Dougherty, M. R., Scheck, P., Nelson, T., & Narens, L. (2005). Using the past to predict the 
future. Memory & Cognition, 33(6), 1096–1115. https://doi.org/10.3758/BF03193216 
Dougherty, M. R., Scheck, P., & Nelson, T. O. (n.d.). Using the past to predict the future. 20. 
Dunlosky, J., & Ariel, R. (2011). Self-regulated learning and the allocation of study time. In B. 
H. Ross (Ed.), Psychology of Learning and Motivation (Vol. 54, pp. 103–140). Academic 
Press. https://doi.org/10.1016/B978-0-12-385527-5.00004-8 
Dunlosky, J., & Hertzog, C. (1998). Training programs to improve learning in later adulthood: 
Helping older adults educate themselves. In D. J. Hacker, J. Dunlosky, & A. C. Graesser 
(Eds.), Metacognition in educational theory and practice (pp. 249–275). Lawrence 
Erlbaum Associates Publishers. 
106 
 
Geller, J. (2017). Would disfluency by any other name still be disfluent? Examining the boundary 
conditions of the disfluency effect [Doctoral dissertation, Iowa State University]. 
https://lib.dr.iastate.edu/etd/15520/ 
Gomes, C. F. A., Brainerd, C. J., Nakamura, K., & Reyna, V. F. (2014). Markovian 
interpretations of dual retrieval processes. Journal of Mathematical Psychology, 59, 50–
64. https://doi.org/10.1016/j.jmp.2013.07.003 
Halamish, V. (2018). Can very small font size enhance memory? Memory & Cognition, 46(6), 
979–993. https://doi.org/10.3758/s13421-018-0816-6 
Ikeda, K., Yue, C. L., Murayama, K., & Castel, A. D. (2016). Achievement goals affect 
metacognitive judgments. Motivation Science, 2(4), 199–219. 
https://doi.org/10.1037/mot0000047 
Janes, J. L., Rivers, M. L., & Dunlosky, J. (2018). The influence of making judgments of 
learning on memory performance: Positive, negative, or both? Psychonomic Bulletin & 
Review, 25(6), 2356–2364. https://doi.org/10.3758/s13423-018-1463-4 
Jenkins, J. J. (1979). Four points to remember: A tetrahedral model of memory experiments. In 
L. S. Cermak & F. I. M. Craik (Eds.), Levels of processing in human memory (pp. 429–
446). Hillsdale, NJ: Erlbaum Associates. 
Jensen, A. R., & Rohwer, W. D. (1963). Verbal mediation in paired-associate and serial learning. 
Journal of Verbal Learning and Verbal Behavior, 1(5), 346–352. 
https://doi.org/10.1016/S0022-5371(63)80015-8 
Karpicke, J. D. (2017). Retrieval-based learning: A decade of progress. In J. Wixted (Ed.), 
Cognitive psychology of memory, Vol. 2 of Learning and memory: A comprehensive 
107 
 
reference (J. H. Byrne, Series Ed., pp. 487-514). http://dx.doi.org/10.1016/B978-0-12-
809324-5.21055-9 
Kelemen, W. l., & Weaver III, C. A. (1997). Enhanced metamemory at delays: Why do 
judgments of learning improve over time? Journal of Experimental Psychology: Learning 
Memory and Cognition, 23(6), 1394–1409. https://doi.org/10.1037/0278-7393.23.6.1394 
King, J. F., Zechmeister, E. B., & Shaughnessy, J. J. (1980). Judgments of knowing: The 
influence of retrieval practice. The American Journal of Psychology, 93(2), 329–343. 
https://doi.org/10.2307/1422236 
Koriat, A. (1997). Monitoring one’s own knowledge during study: A cue-utilization approach to 
judgments of learning. Journal of Experimental Psychology: General, 126(4), 349–370. 
https://doi.org/10.1037/0096-3445.126.4.349 
Koriat, A., & Bjork, R. A. (2005). Illusions of competence in monitoring one’s knowledge 
during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 
31(2), 187–194. 
Kornell, N., & Bjork, R. A. (2008). Optimising self-regulated study: The benefits—and costs—
of dropping flashcards. Memory, 16(2), 125–136. 
https://doi.org/10.1080/09658210701763899 
Maxwell, N. P., & Huff, M. J. (2021). The deceptive nature of associative word pairs: The 
effects of associative direction on judgments of learning. Psychological Research, 85, 
1757–1775. https://doi.org/10.1007/s00426-020-01342-z 
Mazzoni, G., & Nelson, T. O. (1995). Judgments of learning are affected by the kind of encoding 
in ways that cannot be attributed to the level of recall. Journal of Experimental 
108 
 
Psychology: Learning, Memory, and Cognition, 21(5), 1263–1274. 
https://doi.org/10.1037/0278-7393.21.5.1263 
McDaniel, M. A., & Butler, A. C. (2011). A contextual framework for understanding when 
difficulties are desirable. In A. S. Benjamin (Ed.), Successful remembering and successful 
forgetting: A festschrift in honor of Robert A. Bjork. (pp. 175–198). Psychology Press. 
Metcalfe, J., & Finn, B. (2008). Evidence that judgments of learning are causally related to study 
choice. Psychonomic Bulletin & Review, 15(1), 174–179. 
https://doi.org/10.3758/PBR.15.1.174 
Metcalfe, J., & Kornell, N. (2005). A region of proximal learning model of study time allocation. 
Journal of Memory and Language, 52(4), 463–477. 
https://doi.org/10.1016/j.jml.2004.12.001 
Mitchum, A. L., Kelley, C. M., & Fox, M. C. (2016). When asking the question changes the 
ultimate answer: Metamemory judgments change memory. Journal of Experimental 
Psychology: General, 145(2), 200–219. https://doi.org/10.1037/a0039923 
Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing versus transfer 
appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16(5), 519–
533. https://doi.org/10.1016/S0022-5371(77)80016-9 
Mueller, M. L., Dunlosky, J., & Tauber, S. K. (2016). The effect of identical word pairs on 
people’s metamemory judgments: What are the contributions of processing fluency and 
beliefs about memory? Quarterly Journal of Experimental Psychology, 69(4), 781–799. 
https://doi.org/10.1080/17470218.2015.1058404 
109 
 
Mueller, M. L., Tauber, S. K., & Dunlosky, J. (2013). Contributions of beliefs and processing 
fluency to the effect of relatedness on judgments of learning. Psychonomic Bulletin & 
Review, 20(2), 378–384. https://doi.org/10.3758/s13423-012-0343-6 
Myers, S. J., Rhodes, M. G., & Hausman, H. E. (2020). Judgments of learning (JOLs) selectively 
improve memory depending on the type of test. Memory & Cognition, 48(5), 745–758. 
https://doi.org/10.3758/s13421-020-01025-5 
Nairne, J. S., & Pandeirada, J. N. S. (2011). Congruity effects in the survival processing 
paradigm. Journal of Experimental Psychology: Learning, Memory, and Cognition, 
37(2), 539–549. https://doi.org/10.1037/a0021960 
Nairne, J. S., Thompson, S. R., & Pandeirada, J. N. S. (2007). Adaptive memory: Survival 
processing enhances retention. Journal of Experimental Psychology: Learning, Memory, 
and Cognition, 33(2), 263–273. https://doi.org/10.1037/0278-7393.33.2.263 
Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004). The University of South Florida free 
association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, 
& Computers, 36(3), 402–407. https://doi.org/10.3758/BF03195588 
Nelson, T. O., & Narens, L. (1990). Metamemory: A theoretical framework and new findings. In 
G. H. Bower (Ed.), Psychology of Learning and Motivation (Vol. 26, pp. 125–173). 
Academic Press. https://doi.org/10.1016/S0079-7421(08)60053-5 
Palan, S., & Schitter, C. (2018). Prolific.ac—A subject pool for online experiments. Journal of 
Behavioral and Experimental Finance, 17, 22–27. 
https://doi.org/10.1016/j.jbef.2017.12.004 
110 
 
Price, J., & Harrison, A. (2017). Examining what prestudy and immediate judgments of learning 
reveal about the bases of metamemory judgments. Journal of Memory and Language, 94, 
177–194. https://doi.org/10.1016/j.jml.2016.12.003 
Rhodes, M. G. (2016). Judgments of learning: Methods, data, and theory. In J. Dunlosky & S. K. 
Tauber (Eds.), The Oxford handbook of metamemory (pp. 65–80). Oxford University 
Press. 
Rivers, M. L., & Dunlosky, J. (2021). Are test-expectancy effects better explained by changes in 
encoding strategies or differential test experience? Journal of Experimental Psychology 
Learning Memory and Cognition, 47(2), 195–207. https://doi.org/10.1037/xlm0000949 
Rivers, M. L., Janes, J. L., & Dunlosky, J. (2021). Investigating memory reactivity with a within-
participant manipulation of judgments of learning: Support for the cue-strengthening 
hypothesis. Memory, 29(10), 1342-1353. 
https://doi.org/10.1080/09658211.2021.1985143 
Roediger, H. L., III. (2008). Relativity of remembering: Why the laws of memory vanished. 
Annual Review of Psychology, 59(1), 225–254. 
https://doi.org/10.1146/annurev.psych.57.102904.190139 
Rosner, T. M., Davis, H., & Milliken, B. (2015). Perceptual blurring and recognition memory: A 
desirable difficulty effect revealed. Acta Psychologica, 160, 11–22. 
https://doi.org/10.1016/j.actpsy.2015.06.006 
Sahakyan, L., Delaney, P. F., & Kelley, C. M. (2004). Self-evaluation as a moderating factor of 
strategy change in directed forgetting benefits. Psychonomic Bulletin & Review, 11(1), 
131–136. https://doi.org/10.3758/BF03206472 
111 
 
Schäfer, F., & Undorf, M. (2021). Positive and negative reactivity in judgments of learning: 
Shared or distinct mechanisms? 63rd Conference of Experimental Psychologists, Ulm, 
Germany. 
Schwenn, E. A., & Underwood, B. J. (1968). The effect of formal and associative similarity on 
paired-associate and free-recall learning. Journal of Verbal Learning & Verbal Behavior, 
7(4), 817–824. https://doi.org/10.1016/S0022-5371(68)80147-1 
Senkova, O., & Otani, H. (2021). Making judgments of learning enhances memory by inducing 
item-specific processing. Memory & Cognition, 49, 955–967. 
https://doi.org/10.3758/s13421-020-01133-2 
Soderstrom, N. C., Clark, C. T., Halamish, V., & Bjork, E. L. (2015). Judgments of learning as 
memory modifiers. Journal of Experimental Psychology: Learning, Memory, and 
Cognition, 41(2), 553–558. https://doi.org/10.1037/a0038388 
Soderstrom, N. C., & McCabe, D. P. (2011). The interplay between value and relatedness as 
bases for metacognitive monitoring and control: Evidence for agenda-based monitoring. 
Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(5), 1236–
1242. https://doi.org/10.1037/a0023548 
Sommer, W., Heinz, A., Leuthold, H., Matt, J., & Schweinberger, S. (1995). Metamemory, 
distinctiveness, and event-related potentials in recognition memory for faces. Memory & 
Cognition, 23, 1–11. https://doi.org/10.3758/BF03210552 
Son, L., & Metcalfe, J. (2000). Metacognitive and control strategies in study-time allocation. 
Journal of Experimental Psychology. Learning, Memory, and Cognition, 26(1), 204–221. 
https://doi.org/10.1037/0278-7393.26.1.204 
112 
 
Stevens, A. S., & Pierce, B. H. (2019). Do reactive effects of judgments of learning extend to 
words lists? 2019 Annual Meeting of the Psychonomic Society, Montreal, QC, Canada. 
Tauber, S. K., & Rhodes, M. G. (2012). Measuring memory monitoring with judgements of 
retention (JORs). Quarterly Journal of Experimental Psychology, 65(7), 1376–1396. 
https://doi.org/10.1080/17470218.2012.656665 
Tauber, S. K., & Witherby, A. E. (2019). Do judgments of learning modify older adults’ actual 
learning? Psychology and Aging, 34(6), 836–847. https://doi.org/10.1037/pag0000376 
Tekin, E., & Roediger, H. L. (2020). Reactivity of judgments of learning in a levels-of-
processing paradigm. Zeitschrift Für Psychologie, 228(4), 278–290. 
https://doi.org/10.1027/2151-2604/a000425 
Thapar, A., & McDermott, K. B. (2001). False recall and false recognition induced by 
presentation of associated words: Effects of retention interval and level of processing. 
Memory & Cognition, 29(3), 424–432. https://doi.org/10.3758/BF03196393 
Toglia, M. P., Neuschatz, J. S., & Goodwin, K. A. (1999). Recall accuracy and illusory 
memories: When more is less. Memory, 7(2), 233–256. 
https://doi.org/10.1080/741944069 
Underwood, B. J., Ekstrand, B. R., & Keppel, G. (1965). An analysis of intralist similarity in 
verbal learning with experiments on conceptual similarity. Journal of Verbal Learning 
and Verbal Behavior, 4(6), 447–462. https://doi.org/10.1016/S0022-5371(65)80042-1 
Undorf, M., & Bröder, A. (2020). Cue integration in metamemory judgements is strategic. 
Quarterly Journal of Experimental Psychology, 73(4), 629–642. 
https://doi.org/10.1177/1747021819882308 
113 
 
Van Overschelde, J. P., Rawson, K. A., & Dunlosky, J. (2004). Category norms: An updated and 
expanded version of the Battig and Montague (1969) norms. Journal of Memory and 
Language, 50(3), 289–335. https://doi.org/10.1016/j.jml.2003.10.003 
Wilton, R. N. (2006). Interactive imagery and colour in paired-associate learning. Acta 
Psychologica, 121(1), 21–40. https://doi.org/10.1016/j.actpsy.2005.05.006 
Witherby, A. E., & Tauber, S. K. (2017a). The concreteness effect on judgments of learning: 
Evaluating the contributions of fluency and beliefs. Memory & Cognition, 45(4), 639–
650. https://doi.org/10.3758/s13421-016-0681-0 
Witherby, A. E., & Tauber, S. K. (2017b). The influence of judgments of learning on long-term 
learning and short-term performance. Journal of Applied Research in Memory and 
Cognition, 6(4), 496–503. https://doi.org/10.1016/j.jarmac.2017.08.004 
Yang, H., Cai, Y., Liu, Q., Zhao, X., Wang, Q., Chen, C., & Xue, G. (2015). Differential neural 
correlates underlie judgment of learning and subsequent memory performance. Frontiers 
in Psychology, 6, 1699. https://doi.org/10.3389/fpsyg.2015.01699 
Yu, Y., Jiang, Y., & Li, F. (2020). The effect of value on judgment of learning in tradeoff 
learning condition: The mediating role of study time. Metacognition and Learning, 15, 
435–454. https://doi.org/10.1007/s11409-020-09234-8 
Yue, C. L., Castel, A. D., & Bjork, R. A. (2013). When disfluency is—and is not—a desirable 
difficulty: The influence of typeface clarity on metacognitive judgments and memory. 
Memory & Cognition, 41(2), 229–241. https://doi.org/10.3758/s13421-012-0255-8 
Zechmeister, E. B., & Shaughnessy, J. J. (1980). When you know that you know and when you 
think that you know but you don’t. Bulletin of the Psychonomic Society, 15(1), 41–44. 
https://doi.org/10.3758/BF03329756 
114 
 
Zhao, W., Li, B., Shanks, D. R., Zhao, W., Zheng, J., Hu, X., Su, N., Fan, T., Yin, Y., Luo, L., & 
Yang, C. (2021). When judging what you know changes what you really know: Soliciting 
metamemory judgments reactively enhances children’s learning. Child Development, 93, 
405– 417. https://doi.org/10.1111/cdev.13689  
115 
 
Appendix A 
I used slightly different versions of the dual-retrieval model between Experiments 1, 2, 3 
and Experiment 4 to accommodate for the methodological differences between these 
experiments, as word pairs and associative or free recall tests were used in Experiments 1, 2, and 
3, whereas lists of single words and free recall tests were used in Experiment 4.  The dual-
retrieval model used in the first three experiments is described below: 
p(CCC) = D(1 - F) + (1 - D)RJ1J2J3                                                                                                                                        (A1) 
p(CCE) = (1 - D)RJ1J2(1 - J3)                                                                                                                                                       (A2) 
p(CEC) = (1 - D)RJ1(1 - J2)J3                                                                                                                                                       (A3) 
p(CEE) = DF + (1 - D)RJ1(1 - J2)(1 - J3)                                                                                                                             (A4) 
p(ECC) = (1 - D)R(1 - J1)J2J3                                                                                                                                                       (A5) 
p(ECE) = (1 - D)R(1 - J1)J2(1 - J3)                                                                                                                                           (A6) 
p(EEC) = (1 - D)R(1 - J1)(1 - J2)J3                                                                                                                                           (A7) 
p(EEE) = (1 - D)R(1 - J1)(1 - J2)(1 - J3) + (1 - D)(1 - R)                                                                                         (A8) 
where D is the probability that the verbatim trace of an item’s presentation can be directly 
accessed on a recall test, R is the probability that an item can be reconstructed on a recall test 
when the verbatim trace of the item’s presentation cannot be accessed, F is the probability that 
the direct access works in the first recall test but fails simultaneously in both of the following 
two recall tests, and J1, J2 and J3 are the probabilities that a reconstructed item is judged to be 
familiar enough to output on test 1, test 2 and test 3, respectively. 
The dual-retrieval model used in the Experiment 4 was slightly modified regarding the F 
parameter in Equations (A1) and (A2), where p(CCC) was expressed as D(1 - F)(1 - F) + (1 - 
D)RJ1J2J3, and p(CCE) was expressed as D(1 - F)F + (1 - D)RJ1J2(1 - J3).  It can be seen that the 
116 
 
only difference is that the previous version of the dual-retrieval model assumes that the 
forgetting status remains invariant between the second and third recall tests, but the current 
version can cover the situation that participants retained direct access in the second recall test but 
forgot it in the third recall test.  Namely, the F parameter no longer stands for the forgetting 
probability in both recall tests 2 and 3.  Instead, it was now defined as the probability that 
participants lost direct access due to forgetting in the second or the third recall test, with the 
assumption that the probability of forgetting was equal between the two recall tests.   
The likelihood function for the data predicted by the dual-retrieval model is: 
L6 = Π(p )N(i)i                                      (A9) 
where pi is the predicted recall probabilities on the left side of all the aforementioned equations, 
and the N(i) is actual observed data counts.  Because six parameter estimates are obtained with 
the model, one empirical probability is free to vary.  Namely, there is one degree of freedom for 
L6. 
 To estimate goodness of fits, I compared the likelihood in Equation (A9) to the 
likelihood of the same data when all empirical probabilities are free to vary.  The goodness-of-fit 
test is: 
G2 = -2ln[L6 ⁄L7]                                                                                                                     (A10) 
where L6 is the likelihood of the data predicted by the dual-retrieval model, and L7 is the 
likelihood of the same data when all empirical probabilities are free to vary.  G2 has a similar 
asymptotic distribution as 2.  Thus, the critical value of rejecting null hypothesis at the .05 
confidence level is 3.84.
117 
 
Appendix B 
Pair type Cue Target Pair type Cue Target Pair type Cue Target 
Strong spoon fork Weak pliers tweezers Identical ladder ladder 
Strong quack duck Weak cup mug Identical nuts nuts 
Strong crocodile alligator Weak tomb coffin Identical cafe cafe 
Strong porpoise dolphin Weak flea insect Identical crown crown 
Strong lips kiss Weak beard trim Identical toast toast 
Strong daisy flower Weak scalp bald Identical skirt skirt 
Strong gate fence Weak spade diamond Identical stamp stamp 
Strong jam jelly Weak tent woods Identical ham ham 
Strong bunny rabbit Weak handbag wallet Identical tray tray 
Strong grandpa grandma Weak plaster ceiling Identical caravan caravan 
Strong sock shoe Weak collar blouse Identical basket basket 
Strong pull push Weak stove pipe Identical battery battery 
Strong anchor boat Weak trash bag Identical pyramid pyramid 
Strong salad lettuce Weak van bus Identical wedding wedding 
Strong jigsaw puzzle Weak hurt cry Identical ivory ivory 
Strong toaster oven Weak icing chocolate Identical cigar cigar 
Strong hospital sick Weak cloth shirt Identical lion lion 
Strong lime lemon Weak animals soft Identical hay hay 
Strong parcel package Weak alley lane Identical string string 
Strong niece nephew Weak leather purse Identical fiber fiber 
Strong circus clown Weak barley soup Identical suburb suburb 
Strong mustard ketchup Weak blow balloon Identical swim swim 
Strong atom bomb Weak compass ruler Identical clock clock 
Strong squid octopus Weak alcohol vodka Identical boss boss 
Strong tornado hurricane Weak dancer belly Identical blonde blonde 
Strong salt pepper Weak cream whip Identical jungle jungle 
Strong cod fish Weak vein vessel Identical tooth tooth 
Strong nest bird Weak chaos headache Identical bat bat 
Strong bull cow Weak flap seal Identical mansion mansion 
Strong mice rat Weak gymnast tumble Identical bronze bronze 
Strong tractor trailer Weak runner blade Identical deaf deaf 
Strong verb noun Weak penguin cute Identical thicket thicket 
 
  
118 
 
Appendix C 
Cue-target relation Cue Target Cue-target relation Cue Target 
Related idiot stupid Unrelated brush coffee 
Related porpoise dolphin Unrelated broom dog 
Related wrong right Unrelated crawl bread 
Related shore beach Unrelated cube nurse 
Related officer police Unrelated envelope violin 
Related trash garbage Unrelated fork biology 
Related stem flower Unrelated kind grass 
Related joke laugh Unrelated minister cut 
Related empty full Unrelated grape pot 
Related bed sleep Unrelated orchestra smell 
Related daughter son Unrelated strand tool 
Related crust pie Unrelated shallow toy 
Related nephew niece Unrelated alcohol ghost 
Related cheek fat Unrelated sow liberty 
Related creek river Unrelated bone house 
Related corridor hall Unrelated butterfly beat 
Related album record Unrelated carbon throw 
Related grandpa grandma Unrelated galaxy squirrel 
Related swift fast Unrelated heavy garlic 
Related compass direction Unrelated cocktail cement 
Related coach team Unrelated balloon sex 
Related picture frame Unrelated business mosquito 
Related crops corn Unrelated knowledge reptile 
Related charm bracelet Unrelated guest banana 
Related ruby red Unrelated literature quack 
Related emerald green Unrelated tight farm 
Related comedian funny Unrelated acrobat verb 
Related reflection mirror Unrelated author clarinet 
Related crowd people Unrelated basement forest 
Related honey sweet Unrelated biscuit sad 
Related fright scare Unrelated contract foot 
Related conductor train Unrelated dandruff rage 
Related birth death Unrelated virus dream 
Related monument statue Unrelated jewel man  
Related easy hard Unrelated lamb meeting 
Related credit card Unrelated host up 
Related temple church Unrelated fair ski 
119 
 
Related cough cold Unrelated pet art 
Related yarn knit Unrelated sea goat 
Related shooting gun Unrelated sin kite 
  
120 
 
Appendix D 
Target-target relation Cue Target Target-target relation Cue Target 
Related forest uncle Unrelated quack knife 
Related sob aunt Unrelated quill church 
Related joke nephew Unrelated toe noun 
Related sad grandmother Unrelated boulder chair 
Related kite diamond Unrelated pony leg 
Related scissors ruby Unrelated calf apple 
Related stumble pearl Unrelated sail gun 
Related paste emerald Unrelated globe president 
Related filth bus Unrelated pilot house 
Related ink plane Unrelated comb beer 
Related gift boat Unrelated pine robbery 
Related chalk train Unrelated icing ruler 
Related daisy doll Unrelated pen monk 
Related queen ball Unrelated flood ketchup 
Related pigeon puzzle Unrelated slip wood 
Related lamp block Unrelated chapel yard 
Related convent steel Unrelated pupil water 
Related rectangle iron Unrelated tale jazz 
Related circus bronze Unrelated tickle corn 
Related toad lead Unrelated tangerine boot 
Related trout magazine Unrelated assist beetle 
Related yacht journal Unrelated profit tulip 
Related verb novel Unrelated jet diabetes 
Related vase encyclopedia Unrelated robin Christmas 
Related library soldier Unrelated lumber cruise 
Related nail private Unrelated atom salmon 
Related scent colonel Unrelated photo python 
Related spoon officer Unrelated peel wine 
Related wallet dog Unrelated transportation hat 
Related web cat Unrelated salt butterfly 
Related stone horse Unrelated soil guitar 
Related cradle lion Unrelated tall lime 
Related despise cotton Unrelated afraid pencil 
Related swift silk Unrelated yolk purse 
Related cathedral polyester Unrelated sock tango 
Related leaf wool Unrelated physician ferry 
Related pet blue Unrelated circle spade 
121 
 
Related umbrella red Unrelated kitten pepper 
Related dusk green Unrelated empty lemonade 
Related hammer yellow Unrelated macaroni policeman 
  
122 
 
Appendix E 
Blocked lists: 
Categorical label A natural earth formation A vegetable A four-footed animal A part of a building A musical instrument 
List words valley potato tiger office drum 
 river squash horse stairs guitar 
 canyon pepper rabbit lobby flute 
 volcano lettuce giraffe ceiling piano 
 ocean radish elephant window trumpet 
 cliff carrot moose elevator clarinet 
 island tomato squirrel basement violin 
 stream cabbage raccoon floor cello 
 
Randomized lists: 
List label List 1 List 2 List 3 List 4 List 5 
List words lettuce squash valley drum ocean 
 river rabbit giraffe tomato tiger 
 trumpet pepper guitar basement radish 
 ceiling elevator flute squirrel office 
 moose elephant carrot cabbage cello 
 canyon piano volcano horse stream 
 raccoon lobby island floor clarinet 
 stairs potato violin cliff window 
 
123