CHANGED-GOAL OR CUE-STRENGTHENING? EXAMINING JUDGMENT OF LEARNING REACTIVITY THROUGH THE LENS OF THE DUAL-RETRIEVAL MODEL A Dissertation Presented to the Faculty of the Graduate School of Cornell University In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Minyu Chang May 2022 © 2022 Minyu Chang CHANGED-GOAL OR CUE-STRENGTHENING? EXAMINING JUDGMENT OF LEARNING REACTIVITY THROUGH THE LENS OF THE DUAL-RETRIEVAL MODEL Minyu Chang, Ph. D. Cornell University 2022 Recent evidence suggests that making judgments of learning (JOLs) can directly modify subsequent memory performance, which is referred to as JOL reactivity. The present dissertation examined the underlying mechanism of JOL reactivity by (a) testing the two major theoretical explanations for JOL reactivity: the changed-goal hypothesis and the cue-strengthening hypothesis, and (b) pinpointing the retrieval processes that are modified by JOLs with the implementation of the dual-retrieval model. Here, the changed-goal hypothesis assumes that JOLs highlight the difference in learning difficulties among to-be-remembered items and switch learners’ goals from mastering all items to focusing more on easier items at the expense of harder items, thus producing negative reactivity for the latter. The cue-strengthening hypothesis posits that the act of making JOLs strengthens the cues that inform JOLs, thus producing positive reactivity when later memory tests are sensitive to the strengthened cues. In Experiment 1, I compared the reactive effects of item-level JOLs on associative recall between three types of word pairs that differ in learning difficulty: strongly related, weakly related, and identical pairs. In Experiment 2, I tested whether prestudy JOLs produced similar reactive effects as immediate JOLs on associative recall for related pairs. In Experiment 3, I investigated whether JOL reactivity was moderated by inter-item relation (word pairs whose targets were either semantically iii related or unrelated), JOL type (item-level or list-level), and test format (associative or free recall). In Experiment 4, I inspected whether reactivity of item-level and list-level JOLs was moderated by list organization (blocked or randomized) in free recall for categorical word lists. The experiments offered converging support for the cue- strengthening hypothesis rather than for the changed-goal hypothesis. Moreover, although positive JOL reactivity was always accompanied by improvements in recollection for item-specific verbatim details, it was also sometimes supported by enhancements in non-recollective operations (reconstruction and familiarity). Particularly, the process-level mechanism of JOL reactivity varied with material type, JOL type, and test format, which is consistent with the cue-strengthening hypothesis. Last, a contextual framework was recommended for further investigations into JOL reactivity. iv BIOGRAPHICAL SKETCH Minyu Chang was born on October 1st, 1995 in Hunan Province, China. In 2017, she received a Bachelor of Social Science degree (with first honor distinction) from the University of Hong Kong, where she majored in Psychology and minored in Cognitive Science. In the same year, she joined Dr. Charles Brainerd’s Memory and Neuroscience Lab as a Ph. D. student. During her doctorate program, her research revolves around three connected topics: episodic memory, metamemory, and cognitive aging, with an overarching goal to understand the cognitive and metacognitive processes that govern learning and memory and the developmental changes in these processes across the human life span. Specifically, she has implemented behavioral experimentation and mathematical modeling to understand the semantic factors that affect memory processes, the metacognitive processes that regulate learning, and the developmental and disease trajectories in cognition during late adulthood. The present dissertation is part of her work in metamemory. v ACKNOWLEDGMENTS First, I would like to thank my supervisor, Dr. Charles Brainerd, for his wonderful mentorship not only for my dissertation work but throughout the doctorate program. Dr. Brainerd has offered me all I expect from an advisor: He provides me with freedom and guidance to explore my own research ideas and he is always available and supportive when I need help. I have learned so much from him about how to be a researcher, a lab leader, and a supervisor. I would also like to thank my committee members Dr. Valerie Reyna and Dr. Adam Anderson, who have offered me generous support and helpful feedback. Additionally, I would like to express thanks to the undergraduate research assistants in the Memory and Neuroscience Lab. The experiments would not have been possible without their tremendous help in material preparation and data organization. Finally, I would like to extend my deepest gratitude to my father Chun Chang, for raising me and educating me so I can be who I am now and for always valuing my health and happiness more than my achievements. Also, I am immensely grateful to my mother Jie Liu, who brought me into the world and offered me unconditional love. I will always miss her, and I hope I have made her proud. I would also like to thank my boyfriend Yucheng Zhu, for offering a sympathetic ear during stressful times and for always having faith in me even when I failed to do so. vi TABLE OF CONTENTS BIOGRAPHICAL SKETCH .......................................................................................... v ACKNOWLEDGMENTS ............................................................................................. vi TABLE OF CONTENTS ............................................................................................. vii LIST OF FIGURES ....................................................................................................... x LIST OF TABLES ....................................................................................................... xi CHAPTERS CHAPTER 1. INTRODUCTION ................................................................................... 1 JOLs as Metamemory Judgments and Memory Modifiers ........................................ 2 Theoretical Explanations of JOL Reactivity .............................................................. 7 Change-Goal Hypothesis ........................................................................................ 7 Cue-Strengthening Hypothesis ............................................................................. 10 Significance and Implications of JOL Reactivity ..................................................... 15 The Dual-Retrieval Model ........................................................................................ 17 An Overview of Present Experiments ...................................................................... 22 CHAPTER 2. EXPERIMENT 1................................................................................... 26 Method ...................................................................................................................... 27 Participants ........................................................................................................... 27 Materials .............................................................................................................. 27 Procedure .............................................................................................................. 27 Results ...................................................................................................................... 29 ANOVA Results for JOLs ................................................................................ 29 ANOVA Results for Recall .............................................................................. 29 Model Results ................................................................................................... 30 Discussion................................................................................................................. 32 CHAPTER 3. EXPERIMENT 2................................................................................... 36 Method ...................................................................................................................... 38 vii Participants ........................................................................................................... 38 Materials .............................................................................................................. 39 Procedure .............................................................................................................. 40 Results ...................................................................................................................... 41 ANOVA Results for JOLs ................................................................................ 41 ANOVA Results for Recall .............................................................................. 42 Model Results ................................................................................................... 44 Discussion................................................................................................................. 46 CHAPTER 4. EXPERIMENT 3................................................................................... 49 Method ...................................................................................................................... 52 Participants ........................................................................................................... 52 Materials .............................................................................................................. 52 Procedure .............................................................................................................. 53 Results ...................................................................................................................... 54 ANOVA Results for JOLs ................................................................................ 54 ANOVA Results for Associative Recall .......................................................... 56 ANOVA Results for Free Recall ...................................................................... 58 Model Results for Associative Recall .............................................................. 60 Model Results for Free Recall .......................................................................... 62 Discussion................................................................................................................. 63 CHAPTER 5. EXPERIMENT 4................................................................................... 69 Method ...................................................................................................................... 71 Participants ........................................................................................................... 71 Materials .............................................................................................................. 71 Procedure .............................................................................................................. 72 Results ...................................................................................................................... 73 ANOVA Results for JOLs ................................................................................ 73 ANOVA Results for Recall .............................................................................. 74 Model Results ................................................................................................... 77 viii Discussion................................................................................................................. 79 CHAPTER 6. GENERAL DISCUSSION .................................................................... 83 Summary of Main Methodologies, Hypotheses, and Behavioral Findings .............. 83 Process-Level Mechanisms for JOL Reactivity ....................................................... 88 Theoretical Implications and Future Directions ....................................................... 93 A Contextual Framework for Understanding JOL Reactivity .......................... 93 Implications for Other Encoding Tasks ............................................................ 97 Questions That Remain to Be Answered.......................................................... 99 Concluding Comments ........................................................................................... 102 References .................................................................................................................. 103 Appendix A ................................................................................................................ 116 Appendix B ................................................................................................................. 118 Appendix C ................................................................................................................. 119 Appendix D ................................................................................................................ 121 Appendix E ................................................................................................................. 123 ix LIST OF FIGURES Figure 2.1. Associative Recall in Experiment 1 ........................................................... 30 Figure 3.1. JOLs in Experiment 2 ................................................................................ 42 Figure 3.2. Associative Recall in Experiment 2 ........................................................... 44 Figure 4.1. An Overview of the Experiment Design of Experiment 3 ........................ 54 Figure 4.2. Associative Recall in Experiment 3 .......................................................... 58 Figure 4.3. Free Recall in Experiment 3 ...................................................................... 60 Figure 5.1. Free Recall in Experiment 4 ...................................................................... 75 x LIST OF TABLES Table 1.1. A Summary of the Two Major Theoretical Explanations ........................... 14 Table 1.2. Definitions for the Dual-Retrieval Model Parameters................................. 20 Table 1.3. A Summary of Theoretical Predictions for Experiments 1-4 ...................... 25 Table 2.1. Dual-Retrieval Model Fits and Parameter Estimates for Experiment 1 ...... 31 Table 3.1. Dual-Retrieval Model Fits and Parameter Estimates for Experiment 2 ...... 45 Table 4.1. Dual-Retrieval Model Fits and Parameter Estimates for Experiment 3 ...... 62 Table 5.1. Dual-Retrieval Model Fits and Parameter Estimates for Experiment 4 ...... 78 Table 6.1. A Summary of Designs and Recall Findings for Experiments 1-4 ............. 88 Table 6.2. A Summary of Dual-Retrieval Model Findings for Experiments 1-4 ......... 90 xi CHAPTER 1 INTRODUCTION Judgments of learning (JOLs) refer to people’s predictions of their future memory performance for the currently studied materials, which reflect their metacognitive monitoring of their own learning processes. Traditionally, JOLs were assumed to only assess but not alter the underlying learning processes. However, accumulating research has demonstrated that the solicitation of JOLs can directly modify subsequent memory performance (Janes et al., 2018; Mitchum et al., 2016; Myers et al., 2020; Rivers et al., 2021; Senkova & Otani, 2021; Soderstrom et al., 2015; Tekin & Roediger, 2020; Witherby & Tauber, 2017b; Yang et al., 2015; Zhao et al., 2021). Such memory effects are referred to as JOL reactivity (for reviews, see Double et al., 2018; Double & Birney, 2019), where positive JOL reactivity means that the solicitation of JOLs improves subsequent memory and negative JOL reactivity means that the solicitation of JOLs impairs subsequent memory. Several explanations have been proposed for JOL reactivity, among them the most widely studied are the changed-goal hypothesis (Mitchum et al., 2016) and the cue-strengthening hypothesis (Soderstrom et al., 2015). However, given the recency of the two hypotheses, their predictive and explanatory power remains to be determined. The primary aim of my dissertation was to test the predictions of the changed-goal hypothesis and the cue-strengthening hypothesis in a series of hypothesis-driven experiments. Another objective was to investigate the process- level mechanism for JOL reactivity. To achieve this, I used the dual-retrieval model to pinpoint the specific retrieval processes that are modified by the solicitation of JOLs. In the current chapter, I first provide a brief review of how JOLs were traditionally studied and the recent evidence of JOL reactivity. Then, I discuss the major theoretical accounts 1 for JOL reactivity and the significance of studying this effect. After that, I explain the dual- retrieval model, which was used in the present dissertation to estimate the underlying retrieval processes. Last, I provide an overview of the four experiments in the present dissertation. JOLs as Metamemory Judgments and Memory Modifiers In the classic monitoring and control framework of metacognition (T. O. Nelson & Narens, 1990), cognitive processes are split into two interrelated levels. One is the meta-level, and the other is the object-level. The two levels are interrelated in that information flows from the object-level to meta-level via monitoring processes, and based on the information inputs, the meta-level in turn regulates the object-level via control processes. Accordingly, metacognitive monitoring is vital for learning performance as it can guide learners to allocate study time, regulate cognitive recourses, and revise study strategies (Dunlosky & Ariel, 2011; Kornell & Bjork, 2008; Metcalfe & Finn, 2008). Metacognitive monitoring is usually measured by introspective self-reports. JOLs are a typical example of such methods, which ask participants to make self-assessments of learning outcomes during encoding. The most common form of JOLs is item-level JOLs. To illustrate, participants typically study a series of single words, word pairs, or other materials, and they are asked to make a judgment at the end of each study trial regarding the likelihood of remembering the item on a later memory test. Another less studied form of JOLs is aggregate JOLs (e.g., Mazzoni & Nelson, 1995; Stevens & Pierce, 2019), in which participants are asked to provide a global assessment for a set of studied materials, such that how many items from the prior set they expect to remember on a later memory test. Unless otherwise specified, “JOLs” refer to item- level JOLs throughout this chapter. 2 A tacit assumption underlying the use of introspective self-report methods is that they merely monitor a given cognitive process without affecting it (Soderstrom et al., 2015). In other words, there should be no reactivity. In the context of JOLs, it is often implicitly assumed that the solicitation of JOLs should not modify the monitoring processes measured by JOLs and hence should not have any direct effect on later memory performance. However, this assumption seems to face a challenge from a large body of literature on common encoding tasks (e.g., deep processing, survival processing etc.). For instance, research on the level-of-processing effect has shown that asking participants to make judgments regarding the semantic content of study materials (deep processing; e.g., rating pleasantness of each word on a numeric scale) can produce robust memory benefits (Bower et al., 1974; Craik & Tulving, 1975). Similarly, research on survival processing indicates that requiring participants to rate vocabulary words for their relevance to a survival scenario (e.g., trapped in a grassland) results in better retention for those words (Nairne et al., 2007). Note that there is considerable similarity between JOLs and these common encoding tasks: They all require participants to make certain judgments about the study materials during encoding. Thus, there is reason to suspect that JOLs, like the other encoding tasks, can directly affect subsequent memory performance. Indeed, the findings of some early JOL studies were in opposition to the no-reactivity assumption. Arbuckle and Cuddy (1969; Experiment 2) found that recall for word pairs was better for participants who were asked to make JOLs than for participants who were not required to make JOLs. Furthermore, King et al. (1980) reported that after a series of study trials for word pairs, participants who made JOLs and participants who had an opportunity to restudy the word pairs displayed comparable memory performance on the final test. This suggests that making JOLs can similarly enhance later memory performance as additional study trials. 3 Among later studies that include both a JOL condition and a no-JOL control condition, some did not find evidence for JOL reactivity (Ariel et al., 2021; Benjamin et al., 1998; Dougherty et al., 2018; Kelemen & Weaver III, 1997; Kornell & Bjork, 2008; Tauber & Rhodes, 2012), whereas many others did (Dougherty et al., 2005; Janes et al., 2018; Mitchum et al., 2016; Myers et al., 2020; Rivers et al., 2021; Senkova & Otani, 2021; Soderstrom et al., 2015; Tauber & Witherby, 2019; Tekin & Roediger, 2020; Witherby & Tauber, 2017b; Yang et al., 2015; Zechmeister & Shaughnessy, 1980; Zhao et al., 2021). Double et al. (2018) reported a meta- analysis for 17 experiments of this sort. Their results showed that there was a moderate positive JOL reactivity for related word pairs and single-word lists, but no reactivity for unrelated word pairs. However, as Double et al. noted, caution should be taken in interpreting the results of word-list experiments, as only three experiments using word lists were included in the meta- analysis. It should be noted that many studies cited above were not designed to evaluate JOL reactivity, which leads to a lack of methodological standardization that may be responsible for the mixed findings. For example, Dougherty et al. (2005, 2018) administered a recall test prior to the solicitation of JOLs, so participants already had test experience with the to-be-tested word pairs when making JOLs. Additionally, in Tauber and Rhodes (2012), JOLs were always followed by restudy choices. Similarly, in Kornell and Bjork (2008), JOLs were made after “drop” decisions (i.e., put aside and stop studying the items that one has already known) and were only solicited for the dropped items. These studies all had retrieval- or judgment-type tasks administered before JOLs, which can potentially mask JOL reactivity. Thus, a systematic investigation into JOL reactivity requires more focused experimentation. 4 Here, Mitchum et al. (2016) and Soderstrom et al. (2015) are two of the most influential studies that systematically investigated JOL reactivity. They proposed two different theoretical explanations for JOL reactivity based on their findings, which are discussed in the next section. Mitchum et al. (2016) found that JOLs function as a memory modifier primarily through negative reactivity. In their first three experiments, JOL solicitation and cue-target relation of word pairs were factorially manipulated, with the former being a between-subject manipulation and the latter being a within-subject manipulation. Mitchum et al. demonstrated that when related and unrelated word pairs were studied in a mixed list, the solicitation of JOLs weakened the correlation between cue-target relatedness and self-paced study time, suggesting that the overall tendency to allocate less study time to related pairs and more study time to unrelated pairs was reduced relative to when no JOLs were requested. Consequently, the discrepancy in memory performance between related and unrelated pairs was increased in the JOL condition compared to in the no-JOL condition, which was largely driven by negative JOL reactivity for unrelated pairs. In addition, Mitchum et al. showed in Experiment 4 that negative JOL reactivity disappeared when participants studied a pure list of unrelated word pairs, where there were no salient cues for relative item difficulty. Moreover, in Experiment 5, when study time was experimenter-paced rather than self-paced as in Experiments 1-4, the difference in memory performance between related and unrelated pairs was still larger in the JOL condition relative to the no-JOL condition. Therefore, the gist of Mitchum et al.’s findings is that making JOLs produces greater discrepancy in memory performance between related and unrelated word pairs, which is mainly caused by negative reactivity for unrelated word pairs. Mitchum et al. hypothesized that negative JOL reactivity arises out of participants’ adjustment of study strategy 5 based on relative item difficulty. That is, when prompted to make JOLs, they tend to allocate less time to relatively hard items and spend more time on easily remembered items compared to when JOLs were not solicited. On the contrary, Soderstrom et al. (2015) reported that JOLs modify memory mainly through positive reactivity. In Soderstrom et al. (2015; Experiment 1a & 1b), they similarly manipulated JOL solicitation between subjects and cue-target relation of word pairs within subjects. Those authors found that recall for strongly related word pairs was enhanced in the JOL condition compared to in the no-JOL condition. However, there was no difference in recall for weakly related or unrelated pairs between the two JOL conditions. In their Experiment 2, Soderstrom et al. manipulated JOL solicitation between subjects and generation conditions (read versus generation) within subjects. Specifically, in the JOL condition, participants made JOLs only for the read items but not for the generated items. It turned out that the presence of JOLs attenuated the difference in recall between read and generated items but did not eliminate it. To sum up, the main takeaway of Soderstrom’s results is that making JOLs produces positive reactivity for strongly related word pairs, which is probably because JOLs enhance the processing of cue-target relation, similar to the generation task. Thereafter, positive JOL reactivity was replicated with different study materials, test formats, experimental manipulations, and populations. In terms of study materials, positive JOL reactivity was found not only with related word pairs but also with categorized word lists (Senkova & Otani, 2021). In terms of test formats, it was established with recognition tests (Myers et al., 2020) and delayed testing (Witherby & Tauber, 2017b). In terms of experimental manipulations, positive JOL reactivity turned out to be robust when JOL solicitation was manipulated within subjects (Rivers et al., 2021; Yang et al., 2015) and within the depth-of- 6 processing paradigm (Tekin & Roediger, 2020). Moreover, positive JOL reactivity was recently found in elementary school children (Zhao et al., 2021), too. On the other hand, negative JOL reactivity was also observed in a few other studies (Double, 2019; Janes et al., 2018; Rivers et al., 2021; Schäfer & Undorf, 2021), although it only approached conventional significance in Janes et al. (2018). Theoretical Explanations for JOL Reactivity Previously, some speculations have been made that JOL reactivity may simply arise from the extended study time (King et al., 1980; Rhodes, 2016). This possibility was ruled out by the subsequent replications of JOL reactivity when study time was controlled between the JOL and no-JOL conditions (e.g., Janes et al., 2018; Myers et al., 2020; Soderstrom et al., 2015). Thus far, multiple theoretical hypotheses have been proposed to explain how JOLs modify people’s processing of to-be-remembered materials and ultimately affect subsequent memory performance. Below I discuss two major hypotheses for JOL reactivity: the changed-goal hypothesis (Mitchum et al., 2016) and the cue-strengthening hypothesis (Soderstrom et al., 2015). A summary of the core ideas and supporting and opposing evidence for the two hypotheses were presented in Table 1.1. Changed-Goal Hypothesis The changed-goal hypothesis (Mitchum et al., 2016) posits that the presence of JOLs will amplify the perception of differences in learning difficulty among the to-be-remembered items, making people switch their learning goals from mastery-oriented to performance-oriented. According to the discrepancy-reduction model of self-regulated learning (Dunlosky & Hertzog, 1998), people tend to adopt mastery as a goal in normal learning situations, where they allocate more time studying harder items than studying easier items. However, when participants are 7 prompted to make JOLs, they gain a heightened awareness of the fact that some items are more likely to be remembered than others, and hence they may switch their goals from mastering as many items as possible to focusing on remembering the relatively easy items, namely, a performance-oriented goal. According to the region of proximal learning framework (Metcalfe & Kornell, 2005), adopting a performance-oriented goal drives people to allocate more resources to remembering relatively easy and moderately challenging items at the expense of most difficult items. Therefore, the changed-goal hypothesis predicts that making JOLs will increase the discrepancy in memory performance between easier and harder items, which is largely driven by negative JOL reactivity for the latter. Here, it is clear that the changed-goal hypothesis predicts negative JOL reactivity for relatively difficult items. Does it predict any JOL reactivity for relatively easy items? Mitchum et al. (2016) did not detect such an effect. However, based on the rationale behind the changed- goal hypothesis, namely, the switch in learning goals prompts people to emphasize learning of easier items at the cost of harder items, it should also expect positive reactivity for relatively easier items. This is many other researchers’ interpretation of the changed-goal hypothesis, too (e.g., Myers et al., 2020; Tekin & Roediger, 2020). Still, negative JOL reactivity for relatively difficult items offers stronger support for the changed-goal hypothesis, because it is the most distinctive feature of this hypothesis. The changed-goal hypothesis has received support from some experimental results. For instance, in Janes et al.’s (2018) Experiment 1, they used a between-subject manipulation of JOL solicitation and a within-subject manipulation of cue-target relation of word pairs, similar to both Mitchum et al. (2016) and Soderstrom et al. (2015). Importantly, they added another manipulation that study was self-paced for half of the participants and experimenter-paced for 8 the other half. Their results showed that JOL reactivity was attenuated in the self-paced condition compared to the experimenter-paced condition. This was consistent with the changed- goal hypothesis, because participants in the experimenter-paced condition were more likely to focus more on easier items at the expense of harder items given that they only had limited study time, while participants in the self-paced condition should be less likely to switch from a mastery-oriented goal to a performance-oriented goal since they had unlimited study time to master all items (Son & Metcalfe, 2000). Further, in Experiment 2, they found that JOL reactivity was only reliable when participants studied a mixed list of related and unrelated pairs, but not when they studied a pure list of related pairs or unrelated pairs. This was again consistent with the changed-goal hypothesis, according to which there should be no switch in learning goals in the absence of salient cues for relative item difficulty. However, Janes et al. only offered partial support for the changed goal hypothesis, as they did not replicate the decreased correlation between pair relatedness and study time. Moreover, findings from perceptual disfluency research provided some indirect support for the changed-goal hypothesis. Perceptually disfluent items (e.g., backward masked, cursive font, smaller font size, blurred) are usually given lower JOLs than items presented in a normal format, as they are perceived to be harder to remember. When no JOLs are elicited, such materials have been found to improve later memory performance, possibly by provoking elaborative processing. However, the mnemonic effect of perceptual disfluency was wiped out by the presence of JOLs (Besken & Mulligan, 2013; Geller, 2017; Halamish, 2018; Rosner et al., 2015). This seems to be consistent with the changed-goal hypothesis’ prediction that JOLs prompt participants to enhance processing for items that are perceived to be relatively easy (i.e., fluent items) and reduce processing for items that are perceived to be relatively difficult (i.e., 9 disfluent items), which thus erase the benefits of elaborative processing provoked by perceptual disfluency. Nevertheless, Tekin and Roediger’s (2020) recent findings undercut the change-goal hypothesis. These authors reported that the level-of-processing effect was attenuated in the JOL condition compared to the no-JOL condition. Specifically, JOLs improved recognition for items in both the shallow (phonetic-oriented) and deep (semantic-oriented) processing tasks. However, JOL reactivity was significantly larger for items in the shallow processing task than in the deep processing task, even though the former items were perceived as harder to remember (i.e., lower JOLs) than the latter ones. This was contrary to what the changed-goal hypothesis predicts, as the hypothesis assumes relatively easy items (deeply processed items) should benefit more from making JOLs than relatively hard items (shallowly processed items). In addition, Ikeda et al. (2016) provided some evidence against the changed-goal hypothesis, too. These authors instructed participants to study four types of word pairs (unrelated, weakly related, strongly related, and identical pairs) under either performance- oriented or mastery-oriented instructions. They found that there was no difference in recall performance between the performance- and mastery-oriented groups. Importantly, there was no interaction between word pair type and goal orientation in either study time or recall performance, suggesting that different goal orientations did not modify participants’ study time allocation or memory performance. This is clearly in opposition to the changed-goal hypothesis’s assumption that JOL reactivity results from participants’ changing learning goals from master-oriented to performance-oriented. Cue-Strengthening Hypothesis 10 Soderstrom et al. (2015) proposed another hypothesis, the cue-strengthening hypothesis, which was developed based on the cue-utilization framework of JOLs (Koriat, 1997) and the transfer-appropriate multifactor account of generation effects (de Winstanley et al., 1996). The cue-utilization framework (Koriat, 1997) suggests that JOLs are made based on a variety of cues that are available during encoding. Specifically, Koriat theorized that there are three types of cues: intrinsic cues that are embedded in the to-be-remembered items (e.g., concreteness, word relatedness, etc.), extrinsic cues that are concerned with learning conditions or encoding processes applied by the learners (e.g., presentation duration, repetition, etc.), and mnemonic cues that are based on internal and subjective experience (e.g., processing fluency, familiarity, etc.). Meanwhile, according to the transfer-appropriate multifactor account of generation effects, generation strengthens the information that is used in the generation task, and thus it improves memory performance when such information is useful in the later memory test (de Winstanley et al., 1996). Combining those two accounts, the cue-strengthening hypothesis posits that JOLs enhance the cues that participants draw upon when making the judgments, and JOLs enhance subsequent memory performance if the later memory tests are sensitive to the strengthened cues. Myers et al. (2020) recently provided evidence supporting the cue-strengthening hypothesis. In their four experiments, participants studied related or unrelated word pairs in the study phase and took either associative recall, free recall, or recognition test in the test phase. Participants were not told which test format would be administered in advance. The results showed that JOLs displayed positive reactivity for related word pairs with associative recall and recognition tests, but not with free recall tests. Myers et al. reasoned that it is because JOLs enhanced processing for item-specific cues, such as the relation between cue and target within a pair or specific features of the targets. Because associative recall and recognition tests are both 11 sensitive to such cues, performance on these two types of memory tests is enhanced by JOLs. However, free recall tests are more sensitive to inter-item relations rather than item-specific cues, which makes JOLs less beneficial for free recall. Thus, no reactivity was observed in free recall tests. On a related note, Senkova and Otani (2021) hypothesized that making JOLs enhanced memory by specifically strengthening item-specific cues. In their experiments, Senkova and Otani factorially manipulated JOL condition (JOL, no-JOL) and list type (categorized, uncategorized) between subjects and found that JOLs enhanced free recall performance for categorical lists but not for uncategorical lists. They explained that the positive reactive effects of JOLs arise from enhanced item-specific processing, which complements the relational processing promoted by categorical lists. This explanation was backed up by their findings that the recall enhancement produced by JOLs was comparable to two classic manipulations that are known to induce item-specific processing (Experiment 1: pleasantness rating; Experiment 2: mental imagery). Thus, they concluded that JOLs improve subsequent memory performance by specifically strengthening item-specific cues, which can be seen as an additional assumption that is proposed for the cue-strengthening hypothesis. Nevertheless, Mitchum et al.’s (2016; Experiment 1) results were not in compliance with the cue-strengthening hypothesis. Mitchum et al. examined whether JOL reactivity varies as a function of the associative direction of the cue-target word pair, which can be either forward (from cue to target) or backward (from target to cue). Obviously, associative recall favors forward association more than backward association, as participants are required to produce the target word when given the cue word. Research has suggested that associative recall performance was indeed better for forward than for backward pairs, even though participants 12 made comparable JOLs for these two types of pairs (Koriat & Bjork, 2005; Maxwell & Huff, 2021). Therefore, given that associative recall was more sensitive to forward than backward association, the cue-strengthening hypothesis predicts stronger JOL reactivity for forward than for backward pairs. Nevertheless, Mitchum et al. did not find any interaction between JOL condition (JOL, no-JOL) and associative direction (forward, backward), suggesting that JOL reactivity did not differ between forward and backward pairs. Meanwhile, as Rivers et al. (2021) commented, it would be hard for the cue- strengthening hypothesis to accommodate negative JOL reactivity (Janes et al., 2018; Mitchum et al., 2016; Schäfer & Undorf, 2021) without further assumptions. Additionally, the cue- strengthening hypothesis would also have difficulty explaining why JOL reactivity disappeared when using a pure list of related or unrelated word pairs (Janes et al., 2018; Mitchum et al., 2016). However, in that connection, it is worth mentioning that positive JOL reactivity was successfully replicated with a pure list of related pairs in other studies (Tauber & Witherby, 2019; Witherby & Tauber, 2017b). Thus, it still requires further examination to determine whether JOL reactivity varies between mixed versus pure list design. 13 Table 1.1 A Summary of The Two Major Theoretical Explanations for Judgment of Learning Reactivity Theoretical Core Ideas Evidence Explanations Changed-goal Making JOLs Supporting evidence: hypothesis switches - Negative reactivity for unrelated word pairs (Janes et al., 2018; Mitchum et al., 2016; Schäfer & learners’ goals Undorf, 2021) from mastery- - Decreased correlation between item difficulty and study time (Mitchum et al., 2016; but see oriented to Janes et al., 2018) performance- - Attenuated JOL reactivity in the self-paced condition than in the experimenter-paced condition oriented, which (Janes et al., 2018) prompts them to - No JOL reactivity for either related or unrelated pairs in a pure-list design (Janes et al., 2018; focus more on Mitchum et al., 2016, but see Tauber & Witherby, 2019; Witherby & Tauber, 2017b) learning easier - The perceptual disfluency effect was eliminated in the presence of JOLs (Besken & Mulligan, items at the cost 2013; Geller, 2017; Halamish, 2018; Rosner et al., 2015) of harder items Opposing evidence: - Attenuated JOL reactivity in the deep processing condition than in the shallow processing condition (Tekin & Roediger, 2021) - Manipulating goal orientation did not affect study time or recall performance (Ikeda et al., 2016) Cue- Making JOLs Supporting evidence: strengthening strengthens the - Positive reactivity for related pairs but not for unrelated pairs (e.g., Soderstrom et al., 2015; hypothesis cues that inform Myers et al., 2021) JOLs, which - The generation effect was attenuated by the solicitation of JOLs (Soderstrom et al., 2015) benefits memory - Positive reactivity for related pairs only in associative recall and recognition but not in free recall performance if (Myers et al., 2021) memory test is - Positive reactivity for categorical lists but not for uncategorical lists (Senkova & Otani, 2021) sensitive to the Opposing evidence: strengthened - No difference in JOL reactivity between forward and backward associative pairs (Mitchum et al., cues 2016) - No JOL reactivity for related pairs in a pure-list design (Janes et al., 2018; Mitchum et al., 2016, but see Tauber & Witherby, 2019; Witherby & Tauber, 2017b) 14 Significance and Implications of JOL Reactivity Given that JOLs are one of the most commonly used measures in metacognition research, JOL reactivity is an important phenomenon both empirically and theoretically. First of all, JOL reactivity often manifests itself as an interaction between JOL solicitation and other experimental manipulations. This can lead to the consequence that the memory effects of those manipulations may be artificially magnified or reduced in the presence of JOLs. On the one hand, JOLs clearly exaggerated the memory effects of pair relatedness as recall differences between related and unrelated pairs were inflated (Mitchum et al., 2016; Soderstrom et al., 2015). On the other hand, it was observed that the memory benefits of perceptual disfluency could be wiped out by the presence of JOLs (Besken & Mulligan, 2013; Geller, 2017; Halamish, 2018; Rosner et al., 2015). Therefore, if researchers hope to estimate the “pure” memory effects of certain manipulations, they should carefully consider whether to include JOLs in the experiment design or consider including a control condition without JOLs. On a related note, JOLs sometimes strengthen the memory effects of certain factors but not the others when more than one factor is manipulated simultaneously. For example, in Mitchum et al. (2016; Experiment 2), emotional valence and pair relatedness were factorially manipulated within subjects. There was an interaction between pair relatedness (related, unrelated) and JOL condition (JOL, no-JOL), which was driven by the fact that recall for unrelated pairs was reduced in the JOL condition compared to the no-JOL condition. However, there was no interaction between valence and JOL condition, which indicates that the valence effects remained the same regardless of whether JOLs were present. Such preferential reactivity of JOLs would be especially problematic for research that aims at examining the interplay between multiple factors or identifying the predominant factors. For instance, some studies have 15 manipulated both reward value and pair relatedness in word pairs (Ariel et al., 2009; Soderstrom & McCabe, 2011; Yu et al., 2020). They found that both factors enhanced memory, but reward effects overrode the pair relatedness effects when the two factors were in conflict (i.e., high value assigned to unrelated pairs and low values assigned to related pairs). However, all the aforementioned studies requested JOLs in the study phase. Considering that JOL reactivity may favor particular factors over others when those factors are manipulated simultaneously, it remains an open question whether those conclusions still hold in the absence of JOLs. Meanwhile, JOL reactivity can also constrain the theoretical interpretation of certain findings on the relation between metacognitive judgment and memory performance. In metacognition research, the correspondence between JOLs and actual memory performance is often of principal interest, because it serves as an index of monitoring accuracy. The fact that the memory effects of certain manipulations vary as a function of JOL solicitation implies two possible scenarios: Asking participants to report JOLs either evokes additional metacognitive processing that would not be engaged spontaneously or brings such processing from unconsciousness to consciousness (Double & Birney, 2019). Taking perceptual disfluency research as an illustration, disfluency manipulations have been demonstrated to induce a dissociation between JOLs and memory performance, which is interpreted as a metacognitive illusion (Besken & Mulligan, 2013; Castel, 2008; Yue et al., 2013). However, the mere act of requesting JOLs from participants may make them engage in additional processing that counteracts the disfluency advantage. Thus, researchers should be aware that the relation between metacognitive judgment and memory performance may be artificially altered when they use JOLs to measure metamemory judgment. Accordingly, it is advisable that JOL reactivity can be incorporated into the current explanations for the relation between JOLs and memory. 16 Last, JOL reactivity also has potential educational implications. For instance, it is common for textbook writers to add glossaries or short quizzes at the end of each section to help learners rehearse the content. In that connection, recall that King et al. (1980) showed that making JOLs produced comparable memory benefits relative to restudying the items. Meanwhile, Ariel et al. (2021) showed that making JOLs after retrieval practices (i.e., short- answer question) for educational materials led to even better performance in the later memory tests than having retrieval practices alone, suggesting that JOLs produced independent memory benefits from retrieval practice. Thus, it may produce better learning outcomes if self- assessments questions similar to JOLs are incorporated with the glossaries or short quizzes in textbooks. Similarly, instructors can consider inserting JOL-like questions for key concepts in their lecture slides, which may improve students’ retention of those concepts. The Dual-Retrieval Model As discussed above, Senkova and Otani (2021) proposed that JOLs improve subsequent recall by strengthening item-specific processing. To test this hypothesis, they compared the memory effects of JOLs to those of two other item-specific processing tasks (pleasantness rating and mental imagery). Their finding showed comparable levels of memory enhancement for those three conditions, which was in line with their hypothesis. However, this finding offers relatively weak support for their item-specific hypothesis, because it remains unknown whether the similar memory effects were due to similar underlying processes. To answer that question, one will need to make process-level measurements and identify the specific processes that are modified by JOL solicitation. To do that, I implemented the dual-retrieval model (Brainerd et al., 2009; Gomes et al., 2014) in the present dissertation, which is a tool that measures underlying retrieval processes for all conventional recall paradigms. 17 The dual-retrieval model is a Markov model developed based on fuzzy-trace theory’s (FTT) distinction between verbatim and gist traces (Brainerd & Reyna, 1998). FTT posits that people separately encode and store two types of memory traces: verbatim traces of detailed surface content and other item-specific information and gist traces of semantic, elaborative, and relational content. Verbatim traces support errorless recollective retrieval, in which the vivid, realistic surface details of an item’s prior presentation can be directly accessed and consciously reinstated. For example, if the verbatim traces of a study word “sheep” are directly accessible, this word’s prior presentation would be vividly reinstated in consciousness, just as heard via the mind’s ears or seen via the mind’s eyes. Gist traces, instead, support non-recollective retrieval, in which subjects need to reconstruct items based on partially identifying information (typically semantic information) when verbatim traces cannot be directly accessed. For example, if one cannot access the verbatim traces of the word “sheep” but remember the semantic gist of the word (e.g., certain four-footed animal on the farm), then some possible answers can be generated, such as horse, sheep, cow, goat, etc., from which a final answer can be selected and outputted. Since people need to search through a set of candidate items that fit with the partial identifying information, reconstruction is potentially fallible. Namely, people may reconstruct items that are never studied but are consistent with the semantic gist, thus leading to false recall. To help rule out those distractors in the search set, a familiarity judgment operation is implemented before outputting the reconstructed item. That is, only reconstructed items that exceed a certain familiarity threshold will be finally outputted. The dual-retrieval model is constructed to provide quantitative estimates of both verbatim-based recollective retrieval and gist-based non-recollective retrieval. In the dual- 18 retrieval model, the verbatim-based recollective operation is named direct access (D), the gist- based non-recollective operation is labeled reconstruction (R), along with a salve operation of familiarity judgment (J). The definitions of these parameters are presented in Table 1.2. Because the dual-retrieval model assumes that recall is controlled by either recollective or non- recollective retrieval, the probability of successful recall in a single test trial is simply the sum of the probability of successful recollection plus the probability of successful non-recollective retrieval. This can be expressed as a function of recollective and non-recollective parameters: p(C) = D + (1 - D)RJ, where C means correct recall. Similarly, the probability of unsuccessful recall can be expressed as a function of both recollective and non-recollective parameters: p(E) = (1 - D)R(1 - J) + (1 - D)(1 - R), where E means recall error. However, with only a single test trial, which provides two empirical probabilities p(C) and p(E), an identifiability problem occurs. Namely, there are not enough degrees of freedom to estimate three parameters. For this reason, implementation of the dual-retrieval model always requires data from at least three standard recall tests. In this case, the error-success sequence across three recall tests can be expressed as a function of recollective and non-recollective parameters, similar to the two equations above (see Appendix A for more details). Because one single test provides two empirical probabilities, p(C) and p(E), three recall tests together would provide eight empirical probabilities [i.e., p(CCC), p(CCE), p(CEC), p(CEE), p(ECC), p(ECE), p(EEC), p(EEE)]. In this case, there would be enough degrees of freedom to secure identifiable estimates of the parameters. Three standard recall tests are the only prerequisite for using the dual-retrieval model, and one can administer either three study-test cycles or three consecutive test cycles following a single study phase. Only slight modifications need to be made in the dual-retrieval model to 19 accommodate such variations in experiment designs. To preview, I used three consecutive recall tests following a single study phase in all experiments of the present dissertation. Thus, in the specific version of the dual-retrieval model that is used here (Chang, 2019), there is a forgetting parameter (F) added in the model, which indicates the probability of forgetting of verbatim traces after the first recall test (see Table 1.2). The detailed mathematical machinery of the current version of the dual-retrieval model can be found in Appendix A. Interested readers are also recommended to read Gomes et al. (2014) for a more in-depth explanation for various versions of the dual-retrieval model. Table 1.2 Definitions for the Dual-Retrieval Model’s Parameters Parameters Definitions Direct access/recollection: The probability that the verbatim trace of an item’s D presentation can be directly accessed on a recall test Forgetting of direct access: The probability that direct access to the verbatim F trace of an item’s presentation is available on the first recall test but not on the following recall tests due to forgetting Reconstruction: The probability that an item can be reconstructed on a recall test R when the verbatim trace of the item’s presentation cannot be directly accessed Familiarity judgment: The probability that a reconstructed item is judged to be J1, J2, J3 familiar enough to output. J1, J2, J3 represent the familiarity judgment for the first, second, and third recall test, respectively Although the dual-retrieval model was developed based on FTT, it is often used independently from FTT since the validity of the model can always be established by its fits to the recall data. The model fits were inspected using the goodness-of-fit tests, which were conducted by computing the maximum likelihood statistic (G2). In the current experiments, there were eight empirical probabilities with six parameters to be estimated, and thus the model was fitted with one degree of freedom. Because G2(1) is asymptotically distributed as χ2(1), the 20 goodness-of-fit is evaluated by comparing the observed G2(1) to the critical value of χ2(1) for rejecting the null hypothesis, which is 3.84 at the 0.05 confidence level. Here, the null hypothesis is that the predicted and observed frequencies of the eight possible error-success sequences across three tests (i.e., CCC, CCE, CEC, CEE, ECC, ECE, EEC, EEE) are not significantly different from each other. Thus, failure to reject to null hypothesis [G2(1) < 3.84] indicates that the model provides a statistically acceptable account for the data. So far, the dual- retrieval model has demonstrated excellent model fits with various types of recall data, such as free, associative, cued, and serial recall (Brainerd et al., 2002; Brainerd & Reyna, 2010). Examining model fits is always the first step in the statistical procedure of the dual- retrieval model. After the establishment of model fits, the next step is to compute maximum likelihood estimates of the model parameters. Then, the final step is to test statistical differences in these parameters between different experimental conditions. To test a null hypothesis that a parameter i is not significantly different between conditions A and B, one needs to first fit an unrestricted joint model to the combined data of conditions A and B. The unrestricted joint model is created simply by combining two duplicated dual-retrieval models (one for condition A and the other for condition B). After that, a restricted joint model is run by restricting the parameter i to be equal between conditions A and B. Then, the difference in the maximum likelihood statistic G2 between the unrestricted and restricted model, ∆G2, is compared to the critical value of χ2(1), which is 3.84. If ∆G2 > 3.84, the null hypothesis will be rejected, meaning that the parameter i is significantly different between conditions A and B. The step-by-step guide for testing dual-retrieval model fits, estimating model parameters, and conducting condition-wise parameter significance tests can be found at https://www.human.cornell.edu/hd/research/labs/memorylab/research. 21 An Overview of the Present Experiments As reviewed above, recent studies have provided accumulating evidence for JOL reactivity and multiple theoretical explanations have been proposed. However, given that systematic research on this topic is so recent, experimentation has not yet settled on a theoretical account for this phenomenon. Therefore, the primary aim of the current study is testing the predictions of the two major theoretical accounts of JOL reactivity (the changed-goal hypothesis and the cue-strengthening hypothesis). The second aim of the study is to specify the process- level mechanism for JOL reactivity by identifying the specific retrieval processes that are responsible for the reactive effects of JOLs. To achieve the first aim, I designed a series of hypothesis-driven experiments, of which the major designs and hypotheses are described below (A summary of the major theoretical predictions can be found in Table 1.3). To achieve the second aim, I administered three consecutive recall tests in all experiments and fitted the dual- retrieval model (Chang, 2019) to the recall data. In Experiment 1, I compared the reactive effects of item-level JOLs between strongly related, weakly related, and identical word pairs. Identical and strongly related pairs are usually given higher JOLs than weakly related pairs, suggesting that they are perceived as easier to remember than the latter. Under this circumstance, the changed-goal hypothesis predicts negative reactivity for weakly related pairs but positive reactivity for strongly related and identical pairs, because a switch from mastery-oriented to performance-oriented goals prompts participants to focus more on easier items at a cost of harder items. However, the cue- strengthening hypothesis predicts positive reactivity for all three types of pairs, as JOLs should strengthen both cue-target identity and cue-target relatedness in those pairs, which are both diagnostic cues (i.e., cues that are useful in the later memory tests). 22 In Experiment 2, I compared the recall performance for related and unrelated word pairs between a prestudy-JOL condition, an immediate-JOL condition, and a no-JOL condition. Here, prestudy JOLs were made before the presentation of a word pair but with provided information about the pair type (i.e., related vs. unrelated), and immediate JOLs were conventional item-level JOLs that were made after studying each word pair. The changed-goal hypothesis predicts comparable reactivity of prestudy and immediate JOLs because both types of JOLs increase participants’ awareness of the differences in learning difficulty between related and unrelated word pairs. Thus, it predicts that both immediate and prestudy JOLs would produce negative reactivity for unrelated pairs but positive reactivity for related pairs. However, the cue- strengthening hypothesis predicts different patterns of reactivity between prestudy and immediate JOLs. For related pairs, the cue-strengthening hypothesis predicts positive activity of immediate JOLs and either no or very weak positive reactivity of prestudy JOLs. This is because most diagnostic cues are not available until a pair is studied, and thus these cues can only be strengthened by immediate JOLs but not by prestudy JOLs. As for unrelated pairs, the cue- strengthening hypothesis predicts little-to-no reactivity of both immediate and prestudy JOLs. Given that cue-target relation is a dominant diagnostic cue in making JOLs while there is no inherent semantic relation between cue and target in unrelated pairs, both types of JOLs are less likely to draw upon and strengthen diagnostic cues for unrelated pairs than for related pairs. In Experiment 3, I further tested the predictions of the cue-strengthening hypothesis. Here, I investigated how item-level JOLs and list-level JOLs (i.e., aggregate JOLs that ask people to judge how many words they can recall from a studied list) react to the target-target relatedness among word pairs given different criterion tests (associative versus free recall). For target-target related pairs, the cue-strengthening hypothesis predicts positive reactivity of list- 23 level JOLs in free recall but negative or no reactivity of list-level JOLs in associative recall. This is because list-level JOLs should strengthen processing of inter-pair target-target relatedness, which is helpful in free recall but is either harmful or irrelevant to associative recall (Brainerd & Reyna, 2010; Schwenn & Underwood, 1968; Underwood et al., 1965). Additionally, the cue- strengthening hypothesis predicts either negative or no reactivity of item-level JOLs in free recall and little-to-no reactivity of item-level JOLs in associative recall. Regarding the former, free recall for target-target related pairs relies heavily on inter-pair processing, but item-level JOLs primarily focus participants’ attention on within-pair cue-target relatedness and deflect their attention from inter-pair target-target relatedness. Regarding the latter, associative recall relies heavily on cue-target relation. However, since there is no inherent semantic relatedness between cue and target within all pairs, item-level JOLs are less likely to draw upon and strengthen cue- target relation. For target-target unrelated pairs, the cue-strengthening hypothesis predicts little- to-no reactivity of both list-level and item-level JOLs in both free recall and associative recall. This is because there was neither cue-target nor target-target relatedness for JOLs to draw upon and strengthen, and thus JOLs should not produce significant enhancements in memory performance. In Experiment 4, I tested Senkova and Otani’s (2021) item-specific hypothesis, which is closely related to the cue-strengthening hypothesis as it assumes that JOLs improve memory by enhancing item-specific cues embedded in the study materials. I tested this hypothesis by factorially manipulating list organization of categorical lists (blocked, randomized) versus JOL solicitation (item-JOL, no-JOL). The item-specific hypothesis predicts positive JOL reactivity for both blocked and randomized categorical lists in free recall because item-level JOLs should enhance item-specific processing, which complements the relational processing naturally evoked 24 by both types of categorical lists. Moreover, it predicts that item-level JOLs would primarily affect the direct access (D) parameters or the forgetting (F) parameters in the dual-retrieval model, which indexes recollection and forgetting of item-specific verbatim details, respectively. Table 1.3 A Summary of the Major Theoretical Predictions for Recall Performance in Experiments 1-4 Exps Theoretical Predictions for Recall Performance 1 Changed-goal hypothesis: Cue-strengthening hypothesis: - Identical pair: Item-JOL > No-JOL - Identical pair: Item-JOL > No-JOL - Strong pair: Item-JOL > No-JOL - Strong pair: Item-JOL > No-JOL - Weak pair: Item-JOL < No-JOL - Weak pair: Item-JOL > No-JOL 2 Changed-goal hypothesis: Cue-strengthening hypothesis: - Related pairs: Immediate-JOL = - Related pairs: Immediate-JOL > Prestudy-JOL > No-JOL Prestudy-JOL ≥ No-JOL - Unrelated pairs: Immediate-JOL = - Unrelated pairs: Immediate-JOL = Prestudy-JOL < No-JOL Prestudy-JOL = No-JOL 3 Cue-strengthening hypothesis: - Target-target related pairs: - Free recall: List-JOL > No-JOL, Item-JOL ≤ No-JOL - Associative recall: List-JOL ≤ No-JOL, Item-JOL = No-JOL - Target-target unrelated pairs: - Free recall: List-JOL = No-JOL, Item-JOL = No-JOL - Associative recall: List-JOL = No-JOL, Item-JOL = No-JOL 4 Item-specific hypothesis: - Blocked categorical lists: Item-JOL > No-JOL - Randomized categorical lists: Item-JOL > No-JOL - Item-level JOLs primarily affect the D or F parameters Note. Exps = Experiments. Item-JOL, immediate-JOL, prestudy-JOL, list-JOL, and no-JOL all refer to the corresponding JOL conditions. D = direct access parameter. F = forgetting parameter. Additionally, “=” means statistically equivalent recall (i.e., little-to- no reactivity), “>” means significantly better recall (i.e., positive reactivity), “≥” means significantly better or statistically equivalent recall (i.e., positive or no reactivity ), “<” means significantly worse recall (i.e., negative reactivity), and “≤” means significantly worse or statistically equivalent recall (i.e., negative or no reactivity). 25 CHAPTER 2 EXPERIMENT 1 In Experiment 1, I used three different types of cue-target word pairs (weakly related, strongly related, and identical). In this scenario, the changed-goal hypothesis and the cue- strengthening hypothesis have different predictions about JOL reactivity. It has been established in prior JOL studies that participants rank the memorability of the three types of pairs as identical pairs > strongly related pairs > weakly related pairs (Castel et al., 2007; Ikeda et al., 2016). Thus, the changed-goal hypothesis predicts that JOLs prompt participants to focus more on the least and moderately difficult items (identical and strongly related pairs) and less on the most difficult items (weakly related pairs), relative to when JOLs are not solicited. As a result, JOLs would produce positive reactivity for identical and strongly related pairs, but negative reactivity for weakly related pairs. However, the cue-strengthening hypothesis predicts positive JOL reactivity for all three types of word pairs. This is because targets of strongly related pairs, weakly related pairs, and identical pairs are both better recalled than those of unrelated pairs (Castel et al., 2007), indicating that cue-target semantic relation (in both strongly and weakly related pairs) and cue- target identity (in identical pairs) are both diagnostic cues in subsequent associative recall tests. Therefore, because associative recall tests are sensitive to pair relation and word identity, the solicitation of JOLs, which presumably enhances these cues, should improve memory performance for all three types of pairs. Here, I pitted the two hypotheses against each other by examining whether there are differences in JOL reactivity between strongly related, weakly related, and identical pairs. Specifically, I used a 3 (Pair type: weakly related, strongly related, identical)  2 (JOL condition: 26 item-JOL, no-JOL) mixed design, with pair type manipulated within subjects and JOL conditions manipulated between subjects. Method Participants Participants were 88 Cornell undergraduates (Mage = 20.31, SDage = 2.24) who participated for extra course credits. Forty-one participants were randomly assigned to the item- JOL condition, and 47 participants were randomly assigned to the no-JOL condition. The sample size per condition was comparable to that of Castel et al. (2007), in which the JOL pattern of identical pairs > strongly related pairs > weakly related pairs and the recall pattern of strongly related pairs > weakly related pairs = identical pairs > unrelated pairs were established. Materials The experiment was programmed and administered via Qualtrics. The materials were 72 cue-target word pairs that were constructed based on the Nelson free association norms (D. L. Nelson et al., 2004). All word pairs used in Experiment 1 can be found in Appendix B. Among the 72 word pairs, twenty-four were pairs with strong forward associative strength (Mforward = .53, SDforward = .16; e.g., spoon - fork), 24 were pairs with weak forward associative strength (Mforward = .02, SDforward = .01, e.g., beard - trim), and 24 were identical pairs (e.g., ladder – ladder). The three types of word pairs were controlled for concreteness, word frequency, and word length for both cues and targets. Procedure Participants were randomly assigned to either an item-JOL condition or a no-JOL condition. All participants completed two blocks, and the order of the blocks was counterbalanced across participants. In each block, there were a study phase and a test phase. In 27 the study phase, participants studied 36 word pairs, including 12 weakly related pairs, 12 strongly related pairs, and 12 identical pairs. In the test phase, they completed three associative recall tests for the word pairs they just studied. The item-JOL and no-JOL conditions differed only in the study phase. In the study phase of both the item-JOL and no-JOL conditions, each word pair was presented for 10 seconds. In the item-JOL condition, participants were informed in advance that for each word pair they studied, they were required to rate how likely they could recall the word on the right-hand side of the pair when provided with the word on the left-hand side on a later memory test (from 0 -100, with 0 = not likely at all and 100 = totally likely). Participants were also told that they should fine-tune their judgments by using the whole 100-point percentage scale. After each pair was presented for 4 seconds, a JOL prompt (“Likelihood to recall?”) appeared beneath the word pair. Participants were given a maximum of 6 seconds to type their JOLs into a blank box under the JOL prompt. When 6 seconds were up, the screen cleared and the program automatically proceeded to the next word pair. In the no-JOL condition, the only difference from the JOL condition was that all word pairs were presented for 10 seconds without JOL prompts. In the test phase, participants completed three consecutive associative recall tests, with each test preceded by a 1-min buffer task of simple math problem solving. Participants were not given additional study trials before the second or the third tests. Before each recall test, participants were reminded that spelling does not count and that they did not need to worry about spelling. There were in total 36 associative recall test trials, corresponding to the 36 studied word pairs. In each associative recall test trial, participants were provided with the cue word of a word pair, and they were given a maximum of 10 seconds to type the target word that was paired 28 with the cue word during the study phase. They were allowed to advance after 2 seconds, but they were instructed to do so only when they finished typing the target word or when they were certain that they could not recall the target word. Otherwise, the program would automatically advance to the next cue when 10s were up. The order of the test trials was randomized for each participant. Results ANOVA Results for JOLs A one-way repeated measures ANOVA was conducted to compare the effects of pair type (weakly related, strongly related, identical) on JOLs. The ANOVA showed that there was a main effect of pair type, F(2, 80) = 50.11, MSE = 72.18, η 2p = .56, p < .001. This main effect was driven by the fact that identical pairs (M = 72.2, SD = 17.8) and strongly related pairs (M = 69.1, SD = 15.4) both received higher JOLs than weakly related pairs (M = 54.6, SD = 15.7), while there was no significant difference between JOLs for the former two pair types. ANOVA Results for Recall A 3 (Pair type: weakly related, strongly related, identical)  2 (JOL condition: item-JOL, no-JOL)  3 (Test: 1, 2, 3) mixed ANOVA was conducted for recall. The ANOVA results revealed both a main effect of pair type, F(2, 172) = 139.10, MSE = .04, η 2p = .62, p < .001, and a main effect of JOL condition, F(1, 86) = 14.92, MSE = .43, η 2p = .15, p < .001. Post-hoc tests1 revealed that participants recalled more identical pairs (M = .79, SD = .26) and strongly related pairs (M = .76, SD = .23) than weakly related pairs (M = .53, SD = .26), with no significant difference in recall between identical and strongly related pairs. Meanwhile, as can be seen in Figure 2.1, the main effect of JOL condition was driven by the fact that recall was better in the 1 Unless otherwise specified, the post hoc tests referred to Tukey’s test throughout the present dissertation. 29 item-JOL condition (M = .79, SD = .23) than in the no-JOL condition (M = .61, SD = .29). Noticeably, the JOL condition  Pair type interaction was not significant, suggesting that JOLs improved recall to a comparable extent for all three types of word pairs. Figure 2.1. Associative recall for identical, strongly related, and weakly related word pairs across item-JOL and no-JOL conditions in Experiment 1. Panel A = recall test 1. Panel B = recall test 2. Panel C = recall test 3. Panel D = average recall across all three tests. Error bars are based on SEs. Model Results The associative recall data were further analyzed with the dual-retrieval model (Chang, 2019). As can be seen in Table 2.1, the model delivered excellent fits across all six possible 30 combinations of pair type (strongly related, weakly related, identical) and JOL condition (item- JOL, no-JOL). The average G2(1) is .08, which is far below the critical value of 3.84. Table 2.1 Dual-Retrieval Model Fits and Parameter Estimates for Experiment 1 Pair type JOL condition G2 D F J1 J2 J3 R Strong Item-JOL .00 .80 .01 .63 .84 .69 .39 No-JOL .04 .59 .03 .62 .73 .68 .37 Weak Item-JOL .26 .57 .00 .62 .75 .94 .17 No-JOL .06 .39 .03 .58 .77 .78 .15 Identical Item-JOL .02 .79 .01 .61 .74 .76 .67 No-JOL .08 .61 .05 .55 .67 .68 .45 Note. Strong = strongly related pairs; weak = weakly related pairs; identical = identical pairs. D = direct access parameter; F = forgetting parameter; J1 = familiarity judgment parameter for test 1; J2 = familiarity judgment parameter for test 2; J3 = familiarity judgment parameter for test 3; R = reconstruction parameter. Parameters that differed significantly between item- and no-JOL conditions are printed in boldface. Next, to determine which underlying processes were responsible for JOL reactivity, I compared the retrieval parameters between item-JOL and no-JOL conditions for each pair type. The condition-wise parameter test revealed that the D parameter was consistently higher in the item-JOL condition compared to the no-JOL condition across the strongly related (.80 vs. .59), weakly related (.57 vs. .39), and identical pairs (.79 vs. .61), ∆G2s > 15.06, ps < .001. This suggests that item-level JOLs enhanced direct access to item-specific verbatim details for all pair types. On the contrary, the F parameter was consistently lower in the item-JOL condition compared to the no-JOL condition across the strongly related (.01 vs. .03), weakly related (.00 vs. .03), and identical pairs (.01 vs. .05), ∆G2s > 7.64, ps < .006, suggesting that JOLs provided a buffer against forgetting of verbatim traces for all pair types. 31 Meanwhile, for weakly related pairs only, the J3 parameter was higher in the item-JOL condition than in the no-JOL condition (.94 vs. .78), ∆G2 = 6.22, p = .013. Thus, after making item-level JOLs, participants were more likely to output the reconstructed target words of weakly related pairs on the last recall test because those target words felt more familiar to them. Last, the R parameter was higher in the item-JOL condition relative to the no-JOL condition (.67 vs. .45) for identical pairs, ∆G2 = 23.64, p < .001. This indicates that JOLs enhanced participants’ abilities to reconstruct the target words of identical pairs when they did not have direct access to verbatim details of those words. Discussion Experiment 1 showed that item-level JOLs produced positive reactivity for all three types of word pairs (strongly related, weakly related, and identical), and the effects of JOL condition did not interact with the effects of pair type. Recall that the changed-goal hypothesis predicts that item-level JOLs would produce differential reactivity for the three types of pairs: positive reactivity for identical and strongly related pairs but negative reactivity for weakly related pairs. This is because the former two types of word pairs are perceived as easier to learn compared to the latter. Thus, they should be prioritized when JOLs prompt participants to switch their learning goals from mastery-oriented to performance-oriented. Rather, the cue-strengthening hypothesis predicts that item-level JOLs should produce positive reactivity for all three types of pairs, as both cue-target relatedness and cue-target identity are diagnostic cues in subsequent memory tests. Clearly, my results provide support for the cue-strengthening hypothesis rather than the changed-goal hypothesis. It is noteworthy that the current results provide a rather strong counterevidence against the changed-goal hypothesis. Mitchum et al. (2016) and Janes et al. (2018) suggested that even 32 in the absence of significant negative reactivity, a larger discrepancy in memory performance between easier and harder items may reflect a shift from mastery-oriented to performance- oriented goals, which is in line with the changed-goal hypothesis. However, my results showed that there was no interaction between JOL condition and pair type, indicating that the recall difference between easier (strongly related and identical pairs) and harder items (weakly related pairs) remained invariant between the item-JOL and no-JOL conditions. Additionally, there was no negative reactivity observed even at the trend level, but there was reliable and consistent positive reactivity across all three types of word pairs. Thus, it seems difficult for the changed- goal hypothesis to accommodate the current results without additional assumptions. For the first time, I was able to identify which underlying retrieval processes are altered by the solicitation of JOLs via the implementation of the dual-retrieval model. As shown in Table 2.1, the D parameter was consistently higher in the item-JOL condition than in the no-JOL condition across all pair types, and the F parameter was consistently lower. This suggests that making item-level JOLs helped participants retain direct access to the verbatim details of studied word pairs and reduced their susceptibility to forgetting. In addition, it can be seen that there was an additional enhancement in familiarity judgment (J3) for weakly related pairs and reconstruction (R) for identical pairs. Thus, although the reactive effects of JOLs were comparable for all three pair types at the behavior level, the underlying process-level mechanisms were slightly different. This was expected based on the cue-strengthening hypothesis because different cues might be strengthened during the process of making JOLs for different word pairs (e.g., the strength of pair association, the identity of words), and thus the underlying memory processes could be affected in different ways. 33 It may be noted that previous studies showed that JOLs follow a pattern of identical pairs > strongly related pairs > weakly related pairs (Castel et al., 2007; Ikeda et al., 2016). This established that the mechanism predicted by the changed-goal hypothesis should have operated in the current experiment given that certain pairs were perceived to be easier to remember than others. The current results showed a slightly different JOL pattern of identical pairs = strongly related pairs > weakly related pairs. One possible reason for the minor difference may be that unlike the previous studies (Castel et al., 2007; Ikeda et al., 2016), I did not include unrelated word pairs in the current study design. The elimination of unrelated pairs may increase the granularity of the comparative processes between different word types. That is, with unrelated pairs being taken out of the picture, participants’ focus of comparison may be switched from a categorical judgment of presence versus absence of relatedness (related versus unrelated pairs) to a more fine-grained judgment about the strength of relation (strongly related versus weakly related pairs). Thus, the differences between strongly related and weakly related pairs may appear more salient, driving up the perceived memorability for strongly related pairs. Nevertheless, the slight differences in the JOL pattern should not undermine my goal to test the prediction of the changed-goal hypothesis. Because there were significant large differences in JOLs between identical and weakly related pairs and between strongly and weakly related pairs, it was still validated that participants could distinguish the learning difficulty levels between these pair types. That is, participants perceived the identical and strongly related pairs as easier to remember than weakly related pairs, which secures the foundation for the changed-goal hypothesis’s prediction of differential reactivity for identical and strongly related pairs versus for weakly related pairs. 34 It may also be noted that although Soderstrom et al. (2015) reported no reactivity for weakly related pairs, the current experiment demonstrated positive reactivity for weakly related pairs. Nevertheless, the current findings were not without precedent, as Tauber and Witherby (2019; Experiments 3, 4, & 5) also found positive JOL reactivity for weakly related pairs with young adults. It is worth mentioning that the weakly related pairs used in the current experiment and in Tauber and Witherby (2019) were both constructed based on the Nelson free association norms (D. L. Nelson et al., 2004), whereas those used in Soderstrom et al. (2015) were constructed based on the MRC psycholinguistic database (Coltheart, 1981). Thus, just as Tauber and Witherby noted, the discrepant findings may be attributed to the differences in stimuli, such as differences in the levels of cue-target relatedness in weakly related pairs. 35 CHAPTER 3 EXPERIMENT 2 In Experiment 2, I tested both the changed-goal hypothesis’s and the cue-strengthening hypothesis’s predictions about reactivity of prestudy JOLs. Prestudy JOLs were first developed by Castel (2008), which were prompted with provided information about the to-be-studied item but were made before the actual presentation of the item. For instance, before each word pair was presented, Mueller et al. (2013; Experiment 1) told participants whether they were about to study a related or an unrelated word pair and requested them to rate how likely they would recall the word pair in a later memory test. They found that similar to immediate JOLs (i.e., the conventional item-level JOLs that are made immediately after each word pair is presented), prestudy JOLs were substantially higher for related pairs than for unrelated pairs. This suggests that participants could differentiate learning difficulty between these two types of word pairs based on the provided prompts, even before the specific pairs were studied. According to the changed-goal hypothesis, immediate JOLs produce positive reactivity for related word pairs but negative reactivity for unrelated word pairs because the solicitation of JOLs enhances participants’ awareness that related pairs were more likely to be remembered than unrelated ones. As a result, participants de-emphasize their goal of mastering all the pairs and instead focus more on learning relatively easier (related) pairs at the expense of learning relatively harder (unrelated) pairs (Mitchum et al., 2016). Following this logic, the changed-goal hypothesis predicts similar reactive effects of prestudy JOLs as of immediate JOLs, because prestudy JOLs can also function as a reminder of the differences in learning difficulty between related and unrelated pairs. Thus, prestudy JOLs should similarly switch participants’ goals 36 from mastery-oriented to performance-oriented, therefore improving memory for related pairs but impairing memory for unrelated pairs. The cue-strengthening hypothesis, on the other hand, predicts a different pattern of JOL reactivity. Recall that Koriat’s (1997) cue-utilization framework posits that immediate JOLs are based on three general classes of cues: intrinsic, extrinsic, and mnemonic. To recap, intrinsic cues are inherent characteristics of the study items, extrinsic cues are factors that are relevant to encoding conditions or operations, and mnemonic cues are factors that pertain to participants’ subjective learning experience of the items. Because prestudy JOLs are made before actually learning the items, they cannot be possibly based on mnemonic cues. However, with participants’ knowledge about item type (e.g., related versus unrelated pairs) and encoding condition (e.g., single study opportunity), prestudy JOLs can still be based on intrinsic and extrinsic cues (Undorf & Bröder, 2020). Nevertheless, only a small portion of intrinsic cues are accessible when making prestudy JOLs, because the prestudy JOL prompts provide very limited information regarding the intrinsic properties of the to-be-remembered pairs. To illustrate, subjects receive the same prompt of “you are about to study a related word pair” for all related pairs, without any information that is specific to an individual pair. With such a standardized prompt, participants merely know that there exists semantic relation between the cue and target, but they still know nothing about in what ways the cue and target are related (e.g., categorical relation, synonym, antonym, etc.) or how strongly the cue and target are related (e.g., relatively strong or weak). Note that associative recall test requires participants to remember the cue-target pairing that is specific to each pair. Thus, it is unclear how a uniform prompt for all related pairs can help participants output the specific target word that is paired with a given cue. 37 Similarly, only partial extrinsic cues are processed when making prestudy JOLs. Since participants may adopt different strategies for different pairs, the encoding strategies participants adopt for a specific pair might not be accessible until encoding it. Furthermore, it is worth mentioning that JOLs were less sensitive to extrinsic cues than to intrinsic and mnemonic cues (Koriat, 1997). Thus, it would be unclear whether the partial extrinsic cues can be strengthened by JOLs and picked up by the subsequent memory tests. In summary, because substantially fewer cues are processed in making prestudy JOLs than making immediate JOLs, and associative recall tests are not necessarily sensitive to the partial intrinsic and extrinsic cues that are processed in making prestudy JOLs, the cue-strengthening hypothesis predicts either a much weaker reactivity of prestudy JOLs relative to immediate JOLs or no reactivity at all. To test the two predictions discussed above, I used a 2 (Pair type: related, unrelated)  3 (JOL conditions: prestudy-JOL, immediate-JOL, no-JOL) mixed design, with pair type manipulated within subjects and JOL condition manipulated between subjects. The changed- goal hypothesis predicts similar reactivity between prestudy and immediate JOLs, namely positive reactivity for related pairs but negative reactivity for unrelated pairs. However, the cue- strengthening hypothesis predicts different patterns of reactivity between prestudy and immediate JOLs: Immediate JOLs should produce positive reactivity for related pairs, while prestudy JOLs should produce either no reactivity or weaker positive reactivity for related pairs compared to immediate JOLs. Both prestudy and immediate JOLs should produce little-to-no reactivity for unrelated pairs. Method Participants 38 Participants were 119 young adults (Mage = 23.13, SDage = 4.64) recruited from Prolific, an online experiment platform (Palan & Schitter, 2018). They were all fluent English speakers who were located in the United States, Canada, or the United Kingdom. Each participant was compensated $5.85 for participation. Forty-two participants were randomly assigned to the immediate-JOL condition, 34 participants were randomly assigned to the prestudy-JOL condition, and 41 participants were randomly assigned to the no-JOL condition. To ensure data quality, an attention check operation was implemented for each study trial in the no-JOL condition (see more details in the Procedure section below). Following Myers et al. (2020), I adopted the criterion that participants who failed to provide JOLs (in the immediate- and prestudy-JOL conditions) or complete attention check questions (in the no-JOL condition) for at least 80% of the study trials would be removed. Fortunately, no participant missed more than 21% of the JOLs or attention check responses, suggesting that participants were complying with my instructions. However, six participants were removed from analyses because they indicated in the post-experiment survey that they had taken notes during the study phase, despite that I had provided explicit warning at the beginning of the experiment that they should not take notes during study at the risk of missing subsequent words or attention check questions. This leaves a final sample of 40 participants in the immediate-JOL condition, 34 participants in the prestudy- JOL condition, and 39 participants in the no-JOL condition. The sample size of all conditions was larger than that of Mueller et al. (2013), which established robust differences in prestudy JOLs between related and unrelated pairs. Materials The experiment was programmed and administered via Qualtrics. The materials were 80 cue-target word pairs that were constructed based on the Nelson free association norms (D. L. 39 Nelson et al., 2004). Half of the 80 pairs were related word pairs (Mforward = .45, SDforward = .13; e.g., shore - beach), and the other half were unrelated word pairs (e.g., brush - coffee). I carefully matched concreteness, word frequency, and word length for both cues and targets between the two types of word pairs. All word pairs used in Experiment 2 can be found in Appendix C. Procedure Participants were randomly assigned to either the immediate-JOL condition, the prestudy-JOL condition, or the no-JOL condition. All participants completed two blocks, and the order of blocks was counterbalanced across participants. In each block, there were a study phase and a test phase. In the study phase, participants studied 40 word pairs, including 20 related pairs and 20 unrelated pairs. Each pair was presented for 4 seconds. In the immediate- JOL condition, after each pair was presented for 4 seconds, the word pair disappeared and a JOL prompt (“Likelihood to recall?”) appeared. Here, participants were given a maximum of 5 seconds to rate how likely they can recall the word on the right-hand side of the pair when provided with the word on the left-hand side on a later memory test (from 0 -100, with 0 = not likely at all and 100 = totally likely). They were also told to fine-tune their judgments by using the entire 100-point percentage scale. When 5 seconds were up, the screen cleared and the program automatically proceeded to the next word pair. In the prestudy-JOL condition, before each word pair was presented, participants saw a statement informing them of the type of the upcoming pair (“You are about to study a related/unrelated word pair”), along with a JOL prompt (“Likelihood to recall?”) beneath it. They were given five seconds to rate a JOL for the upcoming word pair on a 0-100 scale similar to in the immediate-JOL condition. When five seconds were up, the screen cleared and the given 40 word pair was presented for 4 seconds. In the no-JOL condition, the only difference from the immediate-JOL condition was that after each word pair was presented for 4 seconds, participants were not required to make any JOL ratings. Instead, participants saw a screen with two blank boxes and they were instructed to check both boxes within 5 seconds. This operation was adapted from Bowen et al. (2020), which was meant to discourage participants from writing down the words and to check whether they were paying attention throughout the study phase. In the test phase, participants completed three consecutive associative recall tests, with each test preceded by a 1-min buffer task of math problem solving. The procedure in the test phase was the same as in Experiment 1. After completing both blocks, participants were required to complete a very brief post-experiment survey, which asked whether they had taken notes during the experiment and whether they had any feedback or concerns regarding the current experiment. Results ANOVA Results for JOLs A 2 (Pair type: related, unrelated)  2 (JOL condition: prestudy-JOL, immediate-JOL) mixed ANOVA revealed a main effect of pair type, F(1, 72) = 240.87, MSE = 180.48, η 2p = .77, p < .001, as related word pairs (M = 67.60, SD = 18.99) elicited higher JOLs than unrelated word pairs (M = 32.43, SD = 18.91). There was also a main effect of JOL condition, F(1, 72) = 4.87, MSE = 469.36, η 2p = .06, p = .03, as immediate JOLs (M = 53.64, SD = 27.31) were overall higher than prestudy JOLs (M = 45.76, SD = 23.49). Last, there was a Pair type  JOL condition interaction, F(1, 72) = 18.55, MSE = 180.48, η 2p = .20, p < .001. As can be seen in Figure 3.1, the interaction was driven by the fact that immediate JOLs were higher than prestudy JOL for related pairs (Ms = 75.61 vs. 58.18), but not for unrelated pairs (Ms = 31.67 vs. 33.33). 41 Figure 3.1. JOLs for related and unrelated pairs across immediate- and prestudy-JOL conditions in Experiment 2. Error bars are based on SEs. ANOVA Results for Recall A 2 (Pair type: related, unrelated)  3 (JOL condition: prestudy-JOL, immediate-JOL, no- JOL)  3 (Test: 1, 2, 3) mixed ANOVA revealed a main effect of JOL condition, F(2, 110) = 7.48, MSE = .15, η 2p = .12, p = .001, a main effect of pair type, F(1, 110) = 698.70, MSE = .05, η 2p = .86, p < .001, and a Pair type  Test interaction, F(2, 220) = 10.36, MSE = .0007, η 2p = .09, p < .001. Post hoc tests suggested that recall in the immediate-JOL condition (M = .64, SD = .29) was overall better than in the no-JOL condition (M = .55, SD = .28) and in the prestudy- JOL condition (M = .50, SD = .30), while there was no significant difference between the latter two JOL conditions. Not surprisingly, recall was also better for related pairs (M = .79, SD = .15) than for unrelated pairs (M = .35, SD = .23). Meanwhile, the effect of pair type was not reliably modified by the test, as related pairs were always recalled better across three test cycles (see Figure 3.2). 42 Although the JOL condition  Pair type interaction did not reach the conventional level of statistical significance, I still conducted further analyses to compare recall performance between the three JOL conditions separately for related and unrelated word pairs, because this is critical for testing the predictions of the changed-goal hypothesis and the cue-strengthening hypothesis. Specifically, I conducted two additional 3 (JOL conditions: prestudy-JOL, immediate-JOL, no-JOL)  3 (Test: 1, 2, 3) mixed ANOVAs, one for related word pairs and the other for unrelated word pairs. For related word pairs, the ANOVA revealed a main effect of JOL condition, F(2, 110) = 10.04, MSE = .05, η 2p = .15, p < .001. As shown in Figure 3.2, recall performance was better in the immediate-JOL condition (M = .86, SD = .11) than in the no-JOL condition (M = .76, SD = .15) and in the prestudy-JOL condition (M = .74, SD = .16), ps ≤ .001, while there was no difference in recall between the latter two JOL conditions. This suggests that immediate JOLs produced positive reactivity for related word pairs, whereas prestudy JOLs did not. As for unrelated word pairs, a main effect of JOL condition was found, F(2, 110) = 4.54, MSE = .15, η 2p = .08, p = .013. This main effect was driven by the fact that recall was lower in the prestudy-JOL condition (M = .26, SD = .19) than in the immediate-JOL condition (M = .42, SD = .24), p = .009. Although a visual inspection of Figure 3.2 seems to suggest numerical differences in recall between the immediate- and the no-JOL condition (M = .35, SD = .23) and between the prestudy- and no-JOL conditions, post-hoc tests suggested that both differences did not reach statistical significance. Therefore, immediate and prestudy JOLs both had no reactive effects on recall for unrelated word pairs. 43 Figure 3.2. Associative recall for related and unrelated pairs across immediate-, prestudy-, and no-JOL conditions in Experiment 2. Panel A = recall test 1. Panel B = recall test 2. Panel C = recall test 3. Panel D = average recall across all three tests. Error bars are based on SEs. Model Results The associative recall data were fit to the same dual-retrieval model as in Experiment 1 (Chang, 2019). As can be seen in Table 3.1, the model delivered excellent fits to the recall data across all six possible combinations between JOL condition (immediate-, prestudy-, and no-JOL) and pair type (related, unrelated), with an average G2(1) of 1.44. For related word pairs, the D parameter was higher in the immediate-JOL condition than in the prestudy-JOL condition and in the no-JOL condition (.81 vs. .68 vs. .68), ∆G2s > 9.84, ps < .002. This suggests that immediate JOLs enhanced participants’ direct access to verbatim details whereas prestudy JOLs did not. Meanwhile, the F parameter was significantly higher in the prestudy-JOL condition (.08) than 44 the other two JOL conditions (.04 and .05), ∆G2s > 4.74, ps < .029, indicating that prestudy JOLs induced more forgetting of verbatim traces. Table 3.1 Dual-Retrieval Model Fits and Parameter Estimates for Experiment 2 Pair type JOL condition G2 D F J1 J2 J3 R Related Immediate-JOL .24 .81 .04 .67 .86 .87 .47 Prestudy-JOL 2.20 .68 .08 .59 .75 .75 .42 No-JOL 2.76 .68 .05 .57 .79 .86 .43 Unrelated Immediate-JOL 1.34 .35 .03 .60 .81 .81 .16 Prestudy-JOL .19 .24 .13 .40 .66 .77 .10 No-JOL 1.92 .31 .03 .40 .72 .76 .11 Note. D = direct access parameter; F = forgetting parameter; J1 = familiarity judgment parameter for test 1; J2 = familiarity judgment parameter for test 2; J3 = familiarity judgment parameter for test 3; R = reconstruction parameter. Parameters that differed significantly between immediate-, prestudy-, and no- conditions are printed in boldface. Regarding the unrelated pairs, the D parameter was comparable between the immediate- and no-JOL condition (.35 vs. .31), but it was higher in those two JOL conditions relative to the prestudy-JOL condition (.24), ∆G2s > 12.38, ps < .001. This suggests that immediate JOLs did not improve direct access to verbatim details, while prestudy JOLs undermined it. Meanwhile, the F parameter was again higher in the prestudy-JOL condition than in the other two JOL conditions (.13 vs. .03 vs. .03), ∆G2s > 23.09, ps < .001, suggesting that making prestudy JOLs made unrelated pairs more forgettable. Additionally, unrelated word pairs felt more familiar when participants had made immediate JOLs after encoding them, as the J1 parameter was higher in the immediate-JOL condition compared to the other two JOL conditions (.60 vs. .40 vs. .40), ∆G2s > 3.89, ps < .049. Last, immediate JOLs also elevated the R parameter compared to the prestudy JOL condition (.16 vs. .11), ∆G2 = 6.76, p = .009, indicating that participants found it 45 easier to reconstruct the target word of a pair base on partial information when they had made a JOL for the pair after than before encoding the pair. Discussion Previously, prestudy JOLs were mostly used to isolate the contribution of metacognitive beliefs to JOLs. Thus, most prior studies that administered prestudy JOLs were interested in comparing prestudy and immediate JOLs. In that sort of studies, some researchers compared memory performance between the two JOL conditions and reported that recall in the immediate- JOL condition was better than in the prestudy-JOL condition (Mueller et al., 2013, 2016; Undorf & Bröder, 2020). Still, other studies found no difference in memory performance between the two JOL conditions (Price & Harrison, 2017; Witherby & Tauber, 2017a). My results aligned with the former. Notably, the current study was the first to directly compare a prestudy-JOL condition to a no-JOL control condition so as to examine whether prestudy JOLs also produce reactivity like immediate JOLs. With the no-JOL condition serving as a baseline, I compared reactivity between prestudy JOLs and immediate JOLs, which provided an attractive testbed for both the cue-strengthening hypothesis and the changed-goal hypothesis. My results showed that prestudy JOLs produced no reactivity in associative recall for either related or unrelated word pairs. This is consistent with the cue-strengthening hypothesis, as it predicts that prestudy JOL should have no effects on subsequent recall or weaker effects compared to immediate JOLs because prestudy JOLs were made based on fewer and less diagnostic cues than immediate JOLs. On the other hand, the current result is contrary to what the changed-goal hypothesis predicts. Even though prestudy JOLs were significantly higher for related than for unrelated pairs, suggesting that participants were aware of the differences in learning difficulty between these two types of word pairs, prestudy JOLs produced neither 46 positive reactivity for related pairs nor negative reactivity for unrelated pairs. Thus, the absence of reactive effects of prestudy JOLs provided no evidence that participants had changed their learning goal by focusing more on studying related pairs at the cost of unrelated pairs. Meanwhile, it was also found that immediate JOLs produced positive reactivity for related word pairs but no reactivity for unrelated word pairs, which is again more consistent with the cue- strengthening hypothesis than the changed-goal hypothesis. Next, the model results shed light on the reactive effects of JOLs at the retrieval process level. As shown in Table 3.1, immediate JOLs enhanced recollection of verbatim details for related pairs but not for unrelated pairs, which echoes the ANOVA results at the behavioral level. This result suggests that immediate JOLs only enhanced recollection when the cues embedded in the word pairs were useful in subsequent associative recall tests (i.e., cue-target relatedness in the current scenario), which is again consistent with the cue-strengthening hypothesis. It is noteworthy that for both related pairs in Experiment 2 and strongly related pairs in Experiment 1, there was a highly consistent pattern that positive reactivity of item-level JOLs located in the D parameter. This provides evidence that item-level JOLs improved memory for related pairs mainly by enhancing recollection of verbatim details. Although prestudy JOLs did not modify recall performance at the behavior level, they did impair recollection for unrelated pairs at the process level. At first sight, this seems to be conceptually consistent with the changed-goal hypothesis. Here, one may argue that perhaps negative reactivity of prestudy JOLs was just not strong enough to manifest itself at the behavior level, considering that prestudy JOLs were less sensitive to cue-target relatedness relative to immediate JOLs (see Figure 3.1). Thus, participants’ awareness of differences in item difficulty in the prestudy JOL condition may not be as sharp as in the immediate JOL condition, which 47 thus provides less motivation for them to switch their learning goals, resulting in weaker JOL reactivity. However, this speculation could not explain why prestudy and immediate JOLs produced completely different effects at the process level: immediate JOLs enhanced recollection for related pairs whereas prestudy JOLs impaired recollection for unrelated pairs. If the changed-goal hypothesis assumes that both immediate and prestudy JOLs produce reactivity by prompting participants to switch learning goals and focusing more on related pairs at the expense of unrelated pairs, the effects of prestudy and immediate JOLs should only vary in magnitudes but not in patterns. Last, it is observed that prestudy JOLs increased forgetting for both related and unrelated word pairs. Why is that? Recall that the time of each study trial was matched between the three JOL conditions in the current experiment. That is, each word pair was presented for 4s and was either preceded by a 5-s JOL phase in the prestudy-JOL condition, followed by a 5-s JOL phase in the immediate-JOL condition, or followed by a 5-s attention check in the no-JOL condition. Thus, in such a case, it is possible that making a prestudy JOL for the next pair after the presentation of the prior pair might have interfered with the consolidation for the prior pair. As a result, the verbatim traces of the word pairs may have not been as effectively rehearsed and stored in the prestudy JOL condition as in the other two JOL conditions, which made them less stable and thus more forgettable. Notably, the increase in forgetting was larger for unrelated pairs than for related pairs, possibly because unrelated pairs require more cognitive resources to consolidate relative to related pairs, and thus they should suffer more from the interference in consolidation. This may also explain why prestudy JOLs impaired recollection for unrelated but not for related pairs, which would need to be determined by future research. 48 CHAPTER 4 EXPERIMENT 3 As shown in Double et al.’s (2018) meta-analysis, most prior JOL reactivity studies were conducted with word pairs, whereas only a few studies used lists of single words. As far as I am aware, Stevens and Pierce (2019) was the first study to examine JOL reactivity with categorical single-word lists. Moreover, apart from the item-level JOLs that are used in most prior experiments, they also examined reactivity of list-level JOLs, in which participants were required to estimate how many words they expected to recall from a given word list (Mazzoni & Nelson, 1995; Sahakyan et al., 2004). Their results showed that only list-level but not item-level JOLs improved cued recall for categorical lists. Here, a cued recall test is similar to a free recall test, with the only difference being that the categorical labels of the studied list were presented as test cues. Note that there is an important difference between item- and list-level JOLs: The former should direct participants’ attention to item-specific cues while the latter should direct participants’ attention to inter-item relational cues. Cued recall tests are more sensitive to relational cues than to item-specific cues, which explains why only list-level but not item-level JOLs produced positive reactivity for categorical lists in cued recall. In addition, Myers et al. (2020) reported that asking participants to make item-level JOLs for related word pairs only enhanced their subsequent performance in associative recall and recognition tests but not in free recall tests. Note that associative recall and recognition tests provide participants with the cue words and require participants to recall or recognize a target word that is paired with a specific cue word. In other words, participants have to retain pair- specific cue-target relations. Nevertheless, free recall tests do not provide participants with the cue words. Instead, free recall tests only require participants to recall as many target words as 49 they can remember in any order, without considering the cue-target association within specific pairs. Thus, associative recall and recognition are sensitive to pair-specific cue-target relations, which were strengthened by item-level JOLs, whereas free recall is sensitive to inter-pair target- target relations, to which item-level JOLs were less beneficial. Taken together, Stevens and Pierce’s and Myers et al.’s studies are both in line with the cue-strengthening hypothesis that JOLs only produce positive reactivity when the cues strengthened by JOLs are favored by the subsequent memory test. In Experiment 3, I further tested the cue-strengthening using a design that built upon both Stevens and Pierce (2019) and Myers et al. (2020). Specifically, I used a 2 (Target-target relation: related, unrelated)  2 (Test format: free recall, associative recall)  3 (JOL conditions: item-JOL, list-JOL, no-JOL) mixed design. Target-target relation and test format were manipulated within subjects and JOL condition was manipulated between subjects. To manipulate target-target relation, every four consecutive word pairs were grouped into a list, and the targets of the four pairs on the same list were either semantically related or unrelated. Such target-target relatedness encourages inter- pair relational processing, in contrast to the cue-target relatedness, which primarily invites item- specific processing. Consistent with this notion, target-target relatedness was found to enhance free recall, whereas it either impaired or had no effect on associative recall (Brainerd & Reyna, 2010; Schwenn & Underwood, 1968; Underwood et al., 1965). In Experiment 3, the core prediction of the cue-strengthening hypothesis is that JOL reactivity depends on the interaction between material type, JOL type, and test format. First, list- level JOLs should produce positive reactivity for target-target related pairs but little-to-no reactivity for target-target unrelated pairs in free recall. This is because list-level JOLs direct participants’ attention to inter-pair target-target relations, to which free recall test is sensitive. 50 However, target-target unrelated pairs are less likely to enjoy such benefits than target-target related pairs, as there was no inherent inter-pair relatedness for list-level JOLs to draw upon and strengthen in those pairs. Second, list-level JOLs should produce either negative or no reactivity for target-target related pairs and little-to-no reactivity for target-target unrelated pairs in associative recall. Here, associative recall is sensitive to cue-target relatedness rather than target- target relatedness, and target-target relatedness is either harmful or irrelevant to associative recall (Brainerd & Reyna, 2010; Schwenn & Underwood, 1968; Underwood et al., 1965). Therefore, list-level JOLs, which focus people’s attention on the relatedness among target-target related pairs, should not redound to associative recall performance. Third, item-level JOLs should produce negative or no reactivity for target-target related pairs and little-to-no reactivity for target-target unrelated pairs in free recall, because item-level JOLs are expected to emphasize item-specific features rather than inter-item relations (Mitchum et al., 2016; Myers et al., 2020), whereas free recall is more sensitive to the latter than the former. Forth, item-level JOLs should also have little-to-no reactivity in associative recall for both target-target related and unrelated pairs. This is because within-pair semantic relation is a dominant cue in making item-level JOLs, but such a cue was absent in both target-target related and unrelated pairs. Thus, considering that no JOL reactivity was detected for cue-target unrelated pairs in prior studies (e.g., Soderstrom et al., 2015; Myers et al., 2020), item-level JOLs are not expected to improve associative recall for either target-target related or unrelated pairs. In summary, based on the cue-strengthening hypothesis, it is predicted that for target- target related pairs, list-level JOLs would produce positive reactivity in free recall but negative or no reactivity in associative recall. On the contrary, item-level JOLs should generate negative or no reactivity in free recall but little-to-no reactivity in associative recall. As for target-target 51 unrelated pairs, little-to-no reactivity of both item-level and list-level JOLs is predicted in both free and associative recall. Method Participants Participants were 122 Cornell undergraduates (Mage = 20.15, SDage = 1.23) who participated for course credits. Forty-two participants were randomly assigned to the item-JOL condition, 38 participants were randomly assigned to the list-JOL condition, and 42 participants were randomly assigned to the no-JOL condition. The sample size per condition was comparable to that used in Myers et al. (2020). One participant in the item-level JOL condition was removed from analyses for both JOLs and recall, who failed to provide JOLs for over 80% of the study trials. In the end, data from 41 participants in the item-level JOL condition, 38 in the list-level JOL condition, and 42 in the no-JOL condition were included in the analyses. Materials The experiment was programmed and administered via Qualtrics. The materials were 80 word pairs, which were evenly divided into 20 lists of four word pairs. The cue words of all pairs were chosen from the Nelson free association norms (D. L. Nelson et al., 2004). For half of the lists, the target words of the four word pairs on the same list were the first four exemplars of a categorical list in the Van Overschelde et al. (2004) norms. For the other half of the lists, the target words of the four word pairs on a given list shared no inter-pair target-target relation, as they were randomly picked from four separate categorical lists in the Van Overschelde et al. (2004) norms. I made sure that there was no cue-target association within all word pairs, and that the concreteness, frequency, and length of both cue and target words were comparable 52 between target-target related pairs and unrelated pairs. The materials for Experiment 3 are in Appendix D. Procedure The experimental procedure was very similar to Experiment 1 except for three modifications. First, in the study phase of each block, participants studied 10 lists of four word pairs instead of 36 word pairs. Second, participants took three consecutive free recall tests in one block and three consecutive associative recall tests in the other block. In each block, participants were not told in advance whether they were going to take a free recall test or an associative recall test. The procedure for associative recall tests was the same as in Experiment 1. In free recall tests, participants were given a maximum of 3 minutes to write down as many words on the right-hand side of the studied pairs as they can. They were told to write down the words in any order they like and that they should not worry about spelling. Third, I used slightly different instructions for the item-JOL condition compared to the prior experiments, and I added a list-JOL condition, in parallel to the item-JOL and no-JOL conditions. In the item-JOL condition of the current experiment, participants were told that after the presentation of each pair, they would be asked to rate how likely they can recall the word on the right-hand side of the pair in a later memory test (from 0 -100, with 0 = not likely at all and 100 = totally likely). Here, because participants were not informed of the test format in advance, I removed the part of JOL instruction that was specific to associative recall (“when provided with the word on the left-hand side on a later memory test”), which was explicitly stated in the prior two experiments. Moreover, participants were explicitly informed that they might or might not be provided the words on the left-hand side of the pairs in the later memory tests. 53 In the list-JOL condition, each word pair was presented for 10s without any JOL prompt just like in the no-JOL condition. After each list of four word pairs was presented, participants were prompted to make a list-level JOL during a 10-s interval between consecutive lists (“Among the words on the right-hand side of the four word pairs you just studied, how many of them do you expect to remember on a later memory test?”). Participants were required to enter a whole number between 0 and 4 into a blank box within 10 seconds. Similar to the instructions for item-level JOLs, the instructions for list-level JOLs also explicitly informed participants that they might or might not be provided with the words on the left-hand side of the pairs in the later memory test. An overview of the experiment design of Experiment 3 was shown in Figure 4.1. Pair type JOL condition Test format 40 target-target related pairs Item-level JOL Free recall test convent – steel convent – steel Recall the words on the rectangle – iron Likelihood to recall? right-side of the pairs you circus – bronze just studied. toad – lead List-level JOL … Among the words on the 40 target-target unrelated pairs right-hand side of the four tall – lime word pairs you just studied, afraid – pencil how many of them do you Associative recall test sock – tangle expect to remember? convent – ? atom – salmon rectangle – ? … No JOL Figure 4.1. An overview of the experiment design of Experiment 3. Pair type and test format were manipulated within subjects, while JOL condition was manipulated between subjects. Results ANOVA Results for JOLs Two 2 (Target-target relation: related, unrelated)  2 (Test format: associative recall, free recall)  2 (Test order: free/associative, associative/free) mixed ANOVAs were conducted on item-level JOLs and on list-level JOLs, respectively. Although participants were not informed of 54 the test format during the study phase of either block, they may expect the same test format as in the first block during the study phase of the second block, which can potentially impact their JOLs. Thus, to investigate the possibility, I included both test format and test order in the ANOVAs. The ANOVA for item-level JOLs showed a main effect of target-target relation in that item-level JOLs were higher for target-target related pairs (M = 37.02, SD = 17.70) than for target-target unrelated pairs (M = 33.21, SD = 16.71), F(1, 38) = 13.03, MSE = 41.38, η 2p = .26, p < .001. Meanwhile, the ANOVA revealed a main effect of test order as item-level JOLs were higher when free recall was administered in the first block and associative recall in the second (M = 41.32, SD = 16.43) than in the reverse order (M = 28.76, SD = 15.80), F(1, 38) = 7.66, MSE = 844.33, η 2p = .17, p = .009. There was a Test format  Target-target relation interaction, F(1, 38) = 5.10, MSE = 28.43, η 2p = .12, p = .030. Nevertheless, post hoc tests revealed that item- level JOLs did not differ significantly between associative and free recall for either target-target related or unrelated pairs. Additionally, a Test format  Test order interaction was present, F(1, 38) = 6.55, MSE = 132.50, η 2p = .15, p = .015. Here, post hoc tests demonstrated no significant difference in item-level JOLs between associative and free recall in either test order. Similarly, list-level JOLs were also higher for pairs whose targets were related (M = 2.42, SD = .95) than those whose targets were unrelated (M = 1.68, SD = .79), F(1, 35) = 45.53, MSE = .42, η 2p = .57, p < .001. Thus, participants incorporated information about inter-pair target- target relation into both item- and list-level JOLs. In addition, there was a Test order  Target- target relation interaction, F(1, 35) = 4.91, MSE = .42, η 2p = .12, p = .033. However, post hoc tests suggested that the effect of target-target relatedness was significant in both test orders. 55 I also conducted an additional 2 (Target-target relation: related, unrelated)  2 (JOL condition: item-JOL, list-JOL) mixed ANOVA to compare the sensitivity to target-target relatedness between item-level JOLs and list-level JOLs. Here, I first converted list-level JOLs from a 0-4 scale to a 0-100 scale. This was done by dividing list-level JOLs by four and then multiplying the outcome by 100. The ANOVA showed a main effect of target-target relation, F(1, 77) = 60.01, MSE = 28.17, η 2p = .44, p < .001, a main effect of JOL condition, F(1, 77) = 10.69, MSE = 328.3, η 2p = .12, p = .002, and a Target-target relation  JOL condition interaction, F(1, 77) = 10.80, MSE = 28.17, η 2p = .12, p = .002. Pos hoc tests revealed that item-level JOLs were overall higher than list-level JOLs, and that the target-target relation effect was larger on list-level JOLs than on item-level JOLs. Therefore, item-level JOLs were less sensitive to target- target relation than list-level JOLs. ANOVA Results for Associative Recall A 2 (Target-target relation: related, unrelated)  3 (JOL condition: item-JOL, list-JOL, no-JOL)  3 (Test: 1, 2, 3)  2 (Test order: free/associative, associative/free) mixed ANOVA was conducted for associative recall. The ANOVA indicated a main effect of target-target relation, F(1, 115) = 41.98, MSE = .02, η 2p = .28, p < .001, a main effect of test, F(2, 230) = 19.63, MSE = .004, η 2p = .15, p < .001, a Target-target relation  JOL condition interaction, F(2, 115) = 4.91, MSE = .02, η 2p = .08, p = .009, a Target-target relation  Test order interaction, F(1, 115) = 5.83, MSE = .02, η 2p = .05, p = .017, and a Target-target relation  Test order  JOL condition interaction, F(2, 115) = 3.17, MSE = .02, η 2p = .05, p = .046. The two main effects were driven by the fact that associative recall was higher for target-target related pairs (M = .27, SD = .25) than for target-target unrelated pairs (M = .21, SD = .21), and was better on the first recall test (M = .26, SD = .25) than on the second (M = .23, SD = .23) and the third (M = .23, SD 56 = .23) recall tests. Post-hoc tests for the Target-target relation  JOL condition interaction showed that associative recall did not reliably differ between the three JOL conditions (item- JOL, list-JOL, no-JOL) for either target-target related or unrelated pairs. This suggests that neither item-level JOLs nor list-level JOLs produced reactivity in associative recall for either type of word pairs (see Figure 4.2). Post-hoc tests for the Target-target relation  Test order interaction revealed that associative recall was significantly higher for target-target related pairs than for target-target unrelated pairs regardless of the test order. This suggests that the target- target relatedness effect on associative recall was robust no matter whether associative recall was administered in the first or second block. Given that the effect of Target-target relation was not substantially modified by either JOL condition or test order, no further post hoc tests were conducted for the Target-target relation  Test order  JOL condition interaction. 57 Figure 4.2. Associative recall for target-target related and target-target unrelated pairs across item-, list-, and no-JOL conditions in Experiment 3. Panel A = recall test 1. Panel B = recall test 2. Panel C = recall test 3. Panel D = average recall across all three tests. Error bars are based on SEs. ANOVA Results for Free Recall The 2 (Target-target relation: related, unrelated)  3 (JOL conditions: item-level, list- level, no JOL)  3 (Test: 1, 2, 3)  2 (Test order: free/associative, associative/free) mixed ANOVA for free recall revealed a main effect of JOL condition, F(2, 115) = 5.22, MSE = .16, η 2p = .08, p = .007. Free recall was higher in the list-JOL condition (M = .31, SD = .26) than in the item-JOL (M = .21, SD = .19) and no-JOL (M = .19, SD = .18) conditions, while there was no 58 difference between the latter two conditions. Meanwhile, a main effect of test order was present, F(1, 115) = 7.73, MSE = .16, η 2p = .06, p = .006, free recall was better when participants took the free recall test first. Also, there was a main effect of target-target relation, F(1, 118) = 101.50, MSE = .04, η 2p = .46, p < .001, as free recall was better for target-target related pairs (M = .31, SD = .24) than for target-target unrelated pairs (M = .16, SD = .16). In addition, there were an Target-target relation  JOL condition interaction, F(2, 115) = 20.31, MSE = .04, η 2p = .26, p < .001. As shown in Figure 4.3, the free recall advantage in the list-JOL condition over the item- and no-JOL conditions was only reliable for target-target related pairs (Ms = .48 vs. .17 vs. .20) but not for target-target unrelated pairs (Ms = .17 vs. .12 vs. .14). Last, there was a Target-target relation  Test interaction, F(2, 230) = 4.47, MSE = .002, η 2p = .04, p = .012. However, post hoc tests showed that the effects of target-target relation did not change substantially across the three free recall tests, as target-target related pairs were always recalled better than target-target unrelated pairs. 59 Figure 4.3. Free recall for target-target related and target-target unrelated pairs across item-, list-, and no-JOL conditions in Experiment 3. Panel A = recall test 1. Panel B = recall test 2. Panel C = recall test 3. Panel D = average recall across all three tests. Error bars are based on SEs. Model Results for Associative Recall The same dual-retrieval model (Chang, 2019) was used as in Experiments 1 and 2. As can be seen in the upper section of Table 4.1, the dual-retrieval model delivered excellent fits to the associative recall data across all six possible combinations between JOL condition (item-, list-, and no-JOL) and target-target relation (related, unrelated). The average G2(1) was .87, which is again below the critical value of 3.84. The parameter estimates are also displayed in Table 4.1. For target-target related pairs, the F parameter was lower in the list-JOL condition 60 compared to the item-JOL condition (.03 vs. .20), ∆G2 = 7.63, p = .006. This means that list- level JOLs reduced forgetting compared to item-level JOLs. In addition, the J2 and J3 parameters were both higher in the item-JOL conditions (.95 and .95), relative to the list-JOL (.61and .49) and the no-JOL (.72 and .62) conditions, ∆G2s > 5.24, ps < .022, whereas there were no reliable differences between the latter two JOL conditions. In summary, list-JOLs reduced forgetting for target-target related pairs compared to item-JOLs, whereas item-JOLs made target-target related pairs feel more familiar after they were reconstructed by searching through a possible set of candidate items. In addition, the ordering of the D parameter for target-target related pairs was item-JOL (.29) > list-JOL (.18) > no-JOL (.13). Here, no pairwise comparison reached the conventional level of statistical significance, although the difference between the item- and no- JOL conditions approached significance, ∆G2= 3.78, p = .052. As for target-target unrelated pairs, similar to target-target related pairs, the F parameter was again lower in the list-JOL condition compared to the item-JOL condition (.00 vs. .21), ∆G2 = 7.63, p = .006, and the J2 and J3 parameters were again lower in the list-JOL condition (.55 and .50) than in the item-JOL (.93 and .90) condition, ∆G2s > 9.25, ps < .003. This echoes the aforementioned finding that list-level JOLs reduced forgetting compared to the item-level JOLs, while item-level JOLs enhanced familiarity relative to list-level JOLs. Additionally, the J1, J2, and J3 parameter were all lower in the list-JOL (.67, .55, and .50) condition than in the no-JOL (.80, .67 and .64) condition, ∆G2s > 4.11, ps < .043. Thus, list-level JOLs systematically decreased familiarity for target-target unrelated word pairs. Meanwhile, the D parameter was significantly higher in the item-JOL condition compared to the list-JOL and the no-JOL conditions (.29 vs. .13 vs. .08), ∆G2s > 6.62, ps < .013, suggesting that item-level JOLs enhanced recollection for verbatim details for target-target unrelated pairs relative to the other two JOLs 61 conditions. Last, the ordering of the R parameter was no-JOL (.24) > list-JOL (.21) > item-JOL (.05), with all pairwise comparisons yielding significant difference, ∆G2s > 4.42, ps < .036. Therefore, when item-level and list-level JOLs were administered, participants found it harder to reconstruct the target words of target-target unrelated pairs, compared to when there were no JOLs solicited. Table 4.1 Dual-Retrieval Model Fits and Parameter Estimates for Experiment 3 Test format Target-target JOL relation condition G2 D F J1 J2 J3 R Associative recall Related Item-JOL .00 .29 .20 .50 .95 .95 .07 List-JOL 1.61 .18 .03 .65 .61 .49 .23 No-JOL .48 .13 .13 .79 .72 .62 .22 Unrelated Item-JOL .14 .29 .21 .40 .93 .90 .05 List-JOL 2.69 .13 .00 .67 .55 .50 .21 No-JOL .29 .08 .10 .80 .67 .64 .24 Free recall Related Item-JOL .68 .08 .17 .77 .81 .91 .23 List-JOL .12 .32 .06 .61 .63 .83 .31 No-JOL .09 .19 .08 .59 .49 .49 .13 Unrelated Item-JOL .56 .09 .15 .73 .73 .65 .14 List-JOL .05 .08 .20 .70 .74 .73 .16 No-JOL .21 .09 .22 .55 .52 .57 .15 Note. D = direct access parameter; F = forgetting parameter; J1 = familiarity judgment parameter for test 1; J2 = familiarity judgment parameter for test 2; J3 = familiarity judgment parameter for test 3; R = reconstruction parameter. Parameters that differed reliably across JOL conditions are printed in boldface. Model Results for Free Recall The free recall data were fit to the same dual-retrieval model as the associative recall data (Chang, 2019). As can be seen in the lower section of Table 4.1, the model also delivered 62 excellent fits to the free recall data across all six possible combinations between JOL condition (item-, list-, and no-JOL) and target-target relation (related, unrelated), with an average G2(1) of .29. For target-target related pairs, the ordering of the D parameter was list-JOL (.32) > no- JOL (.19) > item-JOL (.08), with all pairwise comparisons yielding significant differences, ∆G2s > 7.94, ps < .005. This suggests that list-level JOLs enhanced direct access to verbatim traces for target-target related pairs in free recall, whereas item-level JOLs impaired it. Next, the J2 and J3 parameters were higher in the item-JOL condition (.81 and .91) and list-JOL condition (.63 and .83) than in the no-JOL condition (.49 and .49), ∆G2s > 7.42, ps < .007. In addition, the list- and item- JOL conditions differed significantly in the J2 parameter, ∆G2 = 4.15, p = .042. These results suggest that both item-level and list-level JOLs increased familiarity for target- target related pairs in free recall, relative to the no-JOL condition. Last, the R parameter was significantly higher in the list-JOL condition (.31) and the item-JOL condition (.23) compared to the no-JOL condition (.13), ∆G2s > 5.23, p < .022. On the contrary, no condition-wise differences were found in parameters for target-target unrelated pairs. Discussion Consistent with the prediction of the cue-strengthening hypothesis, Experiment 3 showed that item-level JOLs produced no benefits for either target-target related or unrelated pairs, in either associative or free recall. However, while list-level JOLs had no effects on target-target unrelated pairs in either type of recall, they improved free recall (but not associative recall) for target-target related pairs. Please bear in mind that the cue-strengthening hypothesis suggests that JOL reactivity only arises when the cues strengthened by JOLs are matched with the cues used in the memory test. Here, when target-target related pairs, list-level JOLs, and free recall tests were administered, making list-level JOLs should strengthen the target-target relatedness 63 among pairs, to which free recall is very sensitive. Thus, list-level JOLs produced positive reactivity for target-target related pairs in free recall, as predicted by the cue-strengthening hypothesis. Some alternative explanations for positive reactivity are that list-level JOLs simply offered spaced retrieval practice or that list-level JOLs enhanced participants’ expectancy for free recall tests. However, such accounts would have difficulty explaining why list-level JOLs produced a dramatic improvement in free recall for target-target related pairs but not at all for target-target unrelated pairs. Given that there were only four pairs (and thus only four target words) on a list and retrieval practice usually produces a robust boost in memory performance (see Karpicke, 2017 for a review), if participants were using list-level JOLs just as a retrieval practice or if list-level JOLs prompted them to prepare for a free recall, there should also be recall benefits for target-target unrelated pairs, too. Additionally, as it will be seen in Experiment 4, list-level JOLs did not enhance free recall for blocked categorical lists, which would be hard to explain if list-level JOLs merely function as retrieval practices. Apart from the combination of target-target related pairs, list-level JOLs, and free recall test, all other scenarios failed to fulfill the match in cues between study materials, JOLs, and memory tests, as requested by the cue-strengthening hypothesis. Thus, it is not surprising that no JOL reactivity was observed for them. For example, list-level JOLs produced no reactivity for target-target unrelated pairs in free recall, because there was no semantic relation between the target words of consecutive pairs, and list-level JOLs were less likely to draw upon and strengthen target-target relation. In addition, list-level JOLs produced no reactivity for target- target related pairs in associative recall because associative recall is sensitive to information specific to each pair rather than inter-pair relation. Similarly, item-level JOLs produced no reactivity for target-target related or unrelated pairs in associative recall, because associative 64 recall is particularly sensitive to within-pair relation, but there was no inherent cue-target relatedness in either type of pairs for item-level JOLs to strengthen. Last, item-level JOLs also produced no reactivity for target-target related pairs in free recall, because item-level JOLs were not able to strengthen the cues favored by free recall: inter-pair relational cues. Additionally, it can be seen that both item-level and list-level JOLs were higher for target-target related pairs than for target-target unrelated pairs, suggesting that participants perceived the former type of pairs as easier to remember than the latter. Recall that the changed- goal hypothesis assumes that JOLs will change people’s study goals and prompt them to allocate more resources to study easier items at the cost of harder items. Thus, in the current scenario, the changed-goal hypothesis predicts positive reactivity for target-target unrelated pairs and negative reactivity for target-target unrelated pairs, regardless of the test format. However, this was not what the data showed: Negative reactivity was found for neither type of pairs and positive reactivity for target-target related pairs occurred only in free recall but not in associative recall. Therefore, although the primary goal of Experiment 3 was to test the cue-strengthening hypothesis, the results again provide counterevidence against the changed-goal hypothesis. A slightly surprising finding is that associative recall was overall better for target-target related pairs than for target-target unrelated pairs, given that prior studies typically showed no effects or negative effects of target-target relation on associative recall (Brainerd & Reyna, 2010; Schwenn & Underwood, 1968; Underwood et al., 1965). Rivers and Dunlosky’s (2021) recent findings provide a possible explanation here. Their results showed that participants’ recall was similar between target-target related and unrelated pairs in both associative and free recall if they were told at the beginning that they would take an associative recall test. However, when participants were instructed to expect a free recall, they had better recall for target-target related 65 pairs than for target-target unrelated pairs in both test formats. Thus, the effects of target-target relatedness on recall performance (across both associative and free recall) seem to be moderated by test expectancy. Given that both associative and free recall were higher for target-target related pairs than for target-target unrelated pairs in the current study, participants might be overall more inclined to expect a free recall test than an associative recall test, despite that we provided no explicit information about the test format until the test phase. Such a tendency to expect free recall may be attributed to the salient target-target relatedness among half of the pairs, which can prompt participants to focus more on remembering the target words among pairs rather remembering specific cue-target pairing. However, this explanation is post-hoc and speculative, and future research is recommended to further test whether study materials can modify participants’ expectations about test format. At a more fine-grained level, the dual-retrieval model revealed that list-level JOLs improved free recall for target-target related pairs by driving up the D, R, and J parameters. That is, making list-level JOLs helped participants to better grasp the meaning connection between the target words across word pairs, which provides them with better access to the verbatim details of each specific target word, helps them to better reconstruct the target words based on categorical memberships when those words cannot be directly recollected, and makes the reconstructed words more likely to be outputted based on perceived familiarity. To sum up, list-level JOLs turned out to be a sledgehammer operation for target-target related pairs in free recall, which enhances both item-specific verbatim processing and relational gist processing. Moreover, an intriguing contrasting pattern is observed in the D, R, and J parameters between associative recall and free recall. First, for target-target related pairs, the ordering for the D parameter among the three JOL conditions was reversed for associative recall (item-JOL > 66 list-JOL > no-JOL) compared to for free recall (list-JOL > no-JOL > item-JOL). The item-JOL condition marginally improved direct verbatim access relative to the no-JOL condition in associative recall but they reduced it in free recall. On the contrary, list-level JOLs did not affect direct access during associative recall, but they enhanced it during free recall. Note that item- level JOLs should prompt participants to focus more on item-specific processing and divert them from inter-item relational processing, but list-level JOLs should prefer inter-item relational processing to item-specific processing. Meanwhile, associative recall is sensitive to item- specific cues, whereas free recall relies heavily on inter-item relational processing. Thus, it appears that JOLs only boosted the D parameter when there was consistency in the cue preference between JOLs and the memory test, such as when list-level JOLs are followed by free recall or when item-level JOLs are followed by associative recall. Second, list-level JOLs reduced both the R and J parameters for target-target unrelated pairs in associative recall, but they increased both R and J parameters for target-target related pairs in free recall. In the former scenario, associative recall favors item-specific cues rather than inter-item relations. However, list-level JOLs prompted participants to focus on inter-item relations instead of item-specific features, when the target words between consecutive pairs were not meaningfully related. Therefore, list-level JOLs misguided the encoding process for target- target unrelated pairs, which in turn disrupted the reconstruction operation for those items and discouraged participants from outputting the reconstructed items due to low levels of perceived familiarity. On the contrary, in the latter scenario, free recall relies heavily on inter-item relational cues. Because the target words of target-target related pairs shared categorical membership, list-level JOLs effectively facilitated the processing of inter-item relational cues. Thus, the strengthened relational information led to the outcomes that the target words of target- 67 target related pairs were easier to be reconstructed if they could not be recollected and that the reconstructed items were more likely to be outputted because they seemed more familiar. To sum up, the different effects of item- and list-level JOLs on the D, R, and J parameters across test formats suggest that JOL reactivity depends heavily on transfer appropriateness, which aligns with the cue-strengthening hypothesis. 68 CHAPTER 5 EXPERIMENT 4 As mentioned previously, Stevens and Pierce (2019; Experiment 2) found no reactive effects of item-level JOLs on recall of categorical word lists. Nevertheless, they reported in their Experiment 3 that list-level JOLs produced significant recall improvement relative to the no-JOL condition. Thus, they concluded that item-level JOLs produced no reactivity on recall for categorical lists, but list-level JOLs produced positive reactivity. However, Senkova and Otani (2021) reported the contradictory pattern that item-level JOLs produced positive reactivity on recall for categorical lists. Moreover, they proposed that the positive item-level JOL reactivity results from enhanced item-specific processing, because they found comparable levels of memory improvement between item-level JOLs and two typical item-processing manipulations (Experiment 1: pleasantness rating; Experiment 2: mental imagery). A methodological discrepancy between the two studies may contribute to the inconsistent findings. In Stevens and Pierce’s (2019) experiments, the categorical lists were presented in a blocked manner such that words that belong to the same category were always presented consecutively. However, in Senkova and Otani’s (2021) experiments, the order of words was randomized across the categorical lists, so that words that belong to the same category were not presented consecutively. Thus, reactivity of item-level JOL on recall for categorical lists may be constrained by list organization (randomized vs. blocked). Experiment 4 was designed to reconcile the mixed findings between Stevens and Pierce (2019) and Senkova and Otani (2021) and to revisit Senkova and Otani’s item-specific hypothesis. Regarding the first aim of Experiment 4, I examined whether the two previous findings can be replicated within a single experiment. One is Senkova and Otani’s finding that 69 item-level JOLs enhanced recall for categorical lists when the lists were presented in a randomized format. The other is Stevens and Pierce’s finding that item-level JOLs failed to affect recall for categorical lists when the lists were presented in a blocked format. If both findings were replicated in the current experiment with standardized word lists and procedures, I would be able to attribute the contradictory results to the difference in list organization. Regarding the second aim of Experiment 4, according to Senkova and Otani’s item- specific hypothesis, item-level JOLs improve recall for categorical lists by enhancing item- specific processing, which is not readily activated with categorical lists because such lists favor relational processing. If reactivity of item-level JOLs is indeed driven by enhanced item-specific processing, then positive reactivity should be observed in both blocked and randomized categorical lists. Additionally, as Senkova and Otani noted in the discussion of their findings, similar performance between the item-JOL conditions and the conditions that are known to enhance item-specific processing (pleasantness rating or mental imagery) does not guarantee similar underlying processes. However, the use of the dual-retrieval model in the current study can remove such uncertainty by delivering quantitative parameters for separate underlying processes. Thus, if enhanced item-specific processing can account for reactivity of item-level JOLs, the effects of item-level JOLs should locate in parameters that pertain to item-specific processing, namely the direct access (D) or forgetting of direct access (F) parameters. In Experiment 4, the organization of categorical lists (randomized vs. blocked) and JOL condition (item-JOL, list-JOL, and no-JOL) were factorially manipulated, with both being manipulated between subjects. In addition to item-level JOLs, list-level JOLs were administered as in Stevens and Pierce’s study, which is an attempt to replicate their finding of positive reactivity of list-level JOLs on recall for blocked categorical lists. Additionally, list-level JOLs 70 were not expected to produce positive reactivity on recall for randomized categorical lists, given that words from different categorical lists are intermixed and thus there were no coherent categorical relations within individual lists. Method Participants Participants were 240 young adults (Mage = 24.02, SDage = 4.44) recruited from Prolific. Participants were all fluent English speakers who were located in the United States, Canada, or the United Kingdom, and they were paid $2.33 per person for participation. Participants were first randomly assigned to either the item-JOL condition, the list-JOL condition, or the no-JOL condition. Then, within each of the three JOL conditions, participants were randomly assigned to either a randomized list condition or a blocked list condition. Among the 80 participants who were randomly assigned to the item-JOL condition, 45 participants were assigned to the blocked list condition, and 35 were assigned to the randomized list condition. Among the 81 participants who were randomly assigned to the list-JOL condition, 36 participants were assigned to the blocked list condition, and 45 were assigned to the randomized list condition. Among the 79 participants who were randomly assigned to the no-JOL condition, 42 participants were assigned to the blocked list condition, and 37 were assigned to the randomized list condition. The participant assignment was slightly imbalanced between randomized and blocked list conditions due to a technical error in Qualtrics, but the sample size in all conditions of Experiment 3 was comparable to or larger than that in Senkova and Otani (2021; Experiment 1). Materials The experiment was programmed and administered via Qualtrics. The study material was a 40-word list, which consisted of words from five 8-word categorical lists. I used the four 71 categorical lists that were used in Senkova and Otani (2021) and added another categorical list, which was similarly constructed based on the Van Overschelde et al. (2004) category norms (See Appendix E). In the blocked condition, the words on each of the five categorical lists were presented consecutively. In the randomized condition, the words on the five categorical lists were randomly mixed and grouped into five new lists, with the constraint that no more than three consecutive words were from the same categorical list. For both the blocked and randomized lists, the order of words within each list was fixed across all participants, while the order of lists was randomized for each participant. The word lists used in Experiment 4 are displayed in Appendix E. Procedure Participants were randomly assigned to either the item-JOL condition, the list-JOL condition, or the no-JOL condition. All participants completed a study phase and a test phase. In the study phase, participants studied 40 words, with each word presented for 2 seconds. In the item-JOL condition, after each word was presented for 2 seconds, the word disappeared and a JOL prompt (“Likelihood to recall?”) appeared. Participants were required to rate how likely they can recall the word on a later memory test (from 0 -100, with 0 = not likely at all and 100 = totally likely), and they were told to fine-tune their judgments by using the whole 100-point percentage scale. Participants were given a maximum of 4 seconds to make their JOLs, and they need to type their responses into a blank box under the JOL prompt. When 4 seconds were up, the program automatically proceeded to the next word pair. In the no-JOL condition, the only difference from the item-JOL condition was that the JOL task was replaced by a random number generating task as in Senkova and Otani (2021). Specifically, I asked participants to generate a random number between 0 and 100 within 4 seconds after the presentation of each word. In the 72 list-JOL condition, participants were also required to generate a random number after each word was studied, just like in the no-JOL condition. In addition, after each list of eight words was presented, participants were prompted to make a list-level JOL during a 10-s interval between consecutive lists (“How many words do you expect to remember from the list on a later memory test?”). Participants were required to enter a whole number between 0 and 8 into a blank box. The procedure for the test phase was the same as the procedure for free recall tests in Experiment 3. Results ANOVA Results for JOLs To examine the effects of list organization (blocked vs. randomized) on JOLs, I conducted two separate one-way ANOVAs for item- and list-level JOLs, respectively. The effects of list organization on item-level JOLs approached significance, F(1, 78) = 2.96, MSE = 305.31, η 2p = .04, p = .089, with item-level JOLs being marginally higher for blocked categorical lists (M = 57.36, SD = 15.46) than for randomized categorical lists (M = 50.58, SD = 18.88). Meanwhile, list-level JOLs were significantly higher for blocked categorical lists (M = 4.89, SD = 1.19) than for randomized categorical lists (M = 3.73, SD = 1.06), F(1, 79) = 20.89, MSE = 1.29, η 2p = .21, p < .001. Similar to Experiment 3, I converted list-level JOLs to a 0-100 scale and conducted an additional 2 (List organization: blocked, randomized)  2 (JOL condition: item-JOL, list-JOL) between-subject ANOVA to inspect whether list-level JOLs were more sensitive to list organization than item-level JOLs. The ANOVA showed only a main effect of list organization, F(1, 157) = 17.77, MSE = 253.31, η 2p = .10, p < .001, but no List organization  JOL condition 73 interaction, suggesting that there was no significant difference between item- and list-level JOLs’ sensitivity to list organization. ANOVA Results for Recall I first conducted a 2 (List organization: blocked, randomized)  3 (JOL condition: item- JOL, list-JOL, no-JOL)  3 (Test: 1, 2, 3) mixed ANOVA for recall. The ANOVA revealed a main effect of list organization, F(1, 234) = 13.24, MSE = .11, η 2p = .05, p < .001, a main effect of JOL condition, F(2, 234) = 3.14, MSE = .11, η 2p = .03, p = .045, and a main effect of test, F(2, 468) = 5.00, MSE = .004, η 2p = .02, p = .007. As can be seen in Figure 5.1, the main effects were driven by the fact that recall was higher for blocked categorical lists (M = .45, SD = .21) than for randomized categorical lists (M = .37, SD = .18), higher in the item-JOL condition (M = .45, SD = .17) than in the list-JOL condition (M = .38, SD = .23), and higher on the first recall test (M = .42, SD = .19) than on the second (M = .40, SD = .20) or third recall test (M = .41, SD = .21). Additionally, there was a JOL condition  Test interaction, F(4, 468) = 2.99, MSE = .004, η 2p = .03, p = .019, although post hoc tests showed that the JOL condition effect was significant across all three recall tests. Last, the interaction that is of primary interest, the JOL condition  List organization interaction, approached significance, F(2, 234) = 2.49, MSE = .11, η 2p = .02, p = .085. 74 Figure 5.1. Free recall for blocked categorical lists and randomized categorical lists across the item-, list-, and no-JOL conditions in Experiment 4. Panel A = recall test 1. Panel B = recall test 2. Panel C = recall test 3. Panel D = average recall across all three tests. Error bars are based on SEs. Recall that one of the aims of Experiment 4 was to replicate Senkova and Otani’s (2021) finding, in which recall for randomized categorical lists was higher in the item-JOL condition than in the no-JOL condition. Meanwhile, I also hypothesized that list-level JOLs would not enhance recall for randomized categorical lists. Therefore, although the JOL condition  List organization interaction did not reach the convention criterion of statistical significance, I still conducted a planned one-way ANOVA to compare recall between the item-JOL, list-JOL, and 75 no-JOL conditions specifically for randomized categorical lists. Here, because recall results were significantly different across the three test cycles, I only included test 1 data in this planned analysis for comparison to Senkova and Otani, as they had administered only one single test cycle. Similarly, I used least significant difference (LSD) tests for post hoc analyses, just like Senkova and Otani did. An inspection of Figure 5.1 revealed that test 1 recall data displayed a very similar pattern relative to the average recall across tests 1-3. The one-way ANOVA showed that the main effect of JOL condition was significant, F(2, 115) = 4.80, MSE = .03, η 2p = .08, p = .010. LSD tests suggested that the item-JOL condition (M = .44, SD = .16) produced higher recall for randomized categorical lists than both the list-JOL condition (M = .32, SD = .18) and the no-JOL condition (M = .36, SD = .18), ps = .004 and .039. Therefore, I successfully replicated Senkova and Otani’s result that the item-JOL condition produced better recall for randomized categorical lists compared to the no-JOL condition. Additionally, as predicted, there was no difference in recall for randomized categorical lists between the list- and no-JOL conditions, indicating that list-level JOLs produced no reactivity in this scenario. Another aim of Experiment 4 was to replicate Stevens and Pierce’s (2019) finding that the item-JOL condition did not improve recall for blocked categorical lists compared to the no- JOL condition but the list-JOL condition did. Therefore, I also conducted a separate one-way ANOVA to compare the recall between item-, list-, and no-JOL conditions specifically for blocked categorical lists. I again restricted this analysis to test 1 data for comparison, as Stevens and Pierce only administered one test cycle. As shown in Figure 5.1, the recall for blocked categorical lists in test 1 seemed comparable across the three JOL conditions. Indeed, the ANOVA showed that there was no difference in recall for blocked lists between the item-JOL 76 (M = .47, SD = .17), list-JOL (M = .48, SD = .20), and no-JOL conditions (M = .44, SD = .17), F(2, 119) = .50, MSE = .03, η 2p = .008, p = .606. Therefore, although I found the same result as Stevens and Pierce that item-level JOLs did not enhance recall for blocked lists, I did not replicate the recall enhancement they found in the list-JOL condition. Model Results The free recall data in Experiment 4 were fit to a slightly modified dual-retrieval model relative to Experiments 1, 2, and 3, to accommodate the methodological differences between Experiment 4 and the prior three experiments. The prior three experiments all used word pairs followed by associative recall tests (although Experiment 3 used both associative and free recall tests), whereas Experiment 4 used single-word lists followed by free recall tests. The modified model was developed specifically for free recall tests for lists of single words, which has the same six parameters as the previous model: D, F, R, J1, J2, and J3. The only difference from the previous model lies in the F parameter, which is now defined as the probability of forgetting on the second or third recall test. While the previous model assumes forgetting can only occur jointly on both recall tests 2 and 3, the modified model allows for the possibility that participants still had direct access on the second recall test but they lost it on the third recall test, and the probability of forgetting was assumed to be equal between the last two recall tests (see Appendix A for more details). As can be seen in Table 5.1, this modified model delivered excellent fits to the recall data across all possible combinations between JOL conditions (item-, list-, and no- JOL) and list organization (blocked, randomized) except for blocked lists in the list-JOL condition. The average G2(1) of 3.12 was still below the critical value of 3.84, suggesting that the model provided acceptable fits to the current data. 77 For the blocked categorical lists, the F parameter was lower in the item-JOL condition (.05) than in the list-JOL and no-JOL conditions (.10 and .09), ∆G2s > 8.44, ps < .004. This suggests that item-level JOLs functioned as a buffer against forgetting for blocked categorial lists. Meanwhile, the J2 parameter was higher in the list-JOL condition than in the item-JOL condition, ∆G2 = 4.47, p = .034, suggesting that words followed by list-level JOLs felt more familiar in the later recall tests relative to those followed by item-level JOLs. Table 5.1 Dual-Retrieval Model Fits and Parameter Estimates for Experiment 4 List organization JOL condition G2 D F J1 J2 J3 R Blocked Item-JOL .10 .42 .05 .44 .54 .80 .20 List-JOL 14.35 .42 .10 .50 .70 .78 .20 No-JOL .37 .38 .09 .45 .63 .80 .22 Randomized Item-JOL .22 .36 .06 .58 .64 .83 .22 List-JOL 3.63 .22 .10 .81 .65 .91 .16 No-JOL .04 .31 .07 .41 .62 .86 .17 Note. D = direct access parameter; F = forgetting parameter; J1 = familiarity judgment parameter for test 1; J2 = familiarity judgment parameter for test 2; J3 = familiarity judgment parameter for test 3; R = reconstruction parameter. Parameters that differed reliably between JOL conditions are printed in boldface. The patterns were quite different for the randomized lists. Here, the ordering for the D parameter was item-JOL condition (.36) > no-JOL condition (.31) > list-JOL condition (.22), with all pairwise comparisons being significant except that the difference between the item-JOL and no-JOL conditions was on the boundary for statistical significance, ∆G2s ≥ 3.84, ps ≤ .050. This suggests that item-level JOLs enhanced direct access to verbatim traces while list-level- JOLs impaired it compared to the no-JOL condition. Meanwhile, the ordering of J1 parameter was list-JOL condition (.81) > item-JOL condition (.58) > no-JOL condition (.41). All pairwise 78 comparisons yielded significant differences, ∆G2s > 4.11, ps < .043, suggesting that item-level and list-level JOLs both increased familiarity for reconstructed words on the randomized lists. Discussion In Experiment 4, supporting evidence was found for my hypothesis about the discrepant findings between Stevens and Pierce (2019) and Senkova and Otani (2021). Namely, reactivity of item-level JOLs on free recall for categorical lists was bounded by list organization: Item- level JOLs produced positive reactivity when categorical lists were presented in a randomized manner but not in a blocked manner. The dual-retrieval model analyses revealed that the recall advantage for randomized lists in the item-JOL condition was driven by the enhancement in both the D and J parameters. That is, the item-JOL condition provided better access to the verbatim traces of words’ presentations and increased the tendency for familiarity judgment to pass reconstructed words for output, relative to the no-JOL conditions. Because verbatim traces contain literal surface details of specific items, my result seems in harmony with Senkova and Otani’s hypothesis that item-level JOLs enhanced memory by improving item-specific processing. However, Senkova and Otani’s item-specific hypothesis would have difficulty explaining why item-level JOLs enhanced recall for randomized but not for blocked categorical lists. According to this hypothesis, categorical lists encouraged participants to engage in relational processing whereas uncategorical lists promoted item-specific processing. Thus, if item-level JOLs enhanced item-specific processing, it should improve memory for categorical lists, where such processing was not already solicited, more than for uncategorical lists, where such processing was readily provoked. If that is the case, the item- specific hypothesis predicts positive JOL reactivity for categorical lists no matter when lists are randomized or blocked. Notably, item-level JOLs should have more robust positive reactivity 79 with blocked categorical lists, because blocked categorical lists induce even stronger relational processing than randomized categorical lists, in which case item-specific processing should be more beneficial for memory performance. Why did reactivity of item-level JOLs only occur in randomized categorical lists but not in blocked categorical lists? One possible explanation offered by the model analysis is that positive JOL reactivity for randomized categorical lists results from a combination of enhanced recollection of item-specific details and enhanced familiarity based on relational gist. As can be seen in Table 5.1, item-level JOLs improved both verbatim-based and gist-based retrieval processes (D and J1) for the randomized lists, whereas they only affected one verbatim-based process (F) for blocked lists. Moreover, for randomized lists, the difference in the D parameter between the item- and no-JOL conditions was quite small and on the boundary of statistical significance, but the difference in the J1 parameter was much larger and highly significant. Thus, it is possible that although item-level JOLs did improve item-specific processing, the improvement in relational processing, which increased the likelihood of outputting reconstructed words based on familiarity, was a necessary contributor to positive reactivity of item-level JOLs for randomized categorical lists. If that is the case, it is obvious that the enhancement in relational processing should be more beneficial for randomized than for blocked lists, because such processing is more readily solicited by the latter than by the former. That is, with categorically related words presented consecutively, participants would naturally focus on the meaning connection among list words, but when categorically related words were not presented consecutively, participants would need more cognitive resources to grasp the semantic relations among words and then regrouped those words under a common theme. Therefore, if positive reactivity of item-level JOL for categorical lists was partially driven by relational processing, it 80 should be stronger with the randomized than with the blocked categorical lists. However, it should be acknowledged that this explanation is post hoc and speculative, which needs to be further examined in future research. Meanwhile, it was predicted that list-level JOLs should not produce reactivity for randomized categorical lists, because list-level JOLs direct participants’ attention to the relations among the words on the same list when these words were not meaningfully related. Thus, there were no useful cues strengthened in the process of making list-level JOLs, failing the precondition of JOL reactivity proposed by the cue-strengthening hypothesis. The results in Experiment 4 were consistent with this prediction. Additionally, Experiment 4 also showed that neither item- nor list-level JOLs enhanced recall for blocked categorical lists. Here, the former finding was consistent with Stevens and Pierce’s (2019) finding whereas the latter was not. One possible reason why the list-level JOL reactivity for blocked lists was not replicated is the difference in test format: Stevens and Pierce used cued recall in their experiments, whereas I used free recall in the current experiment. Note that cued recall provided categorical labels as test cues compared to free recall, which facilitated relational processing. Since list-level JOLs slant participants toward relational processing, cued recall should be more sensitive to the cues strengthened by list-level JOLs compared to free recall. Therefore, perhaps reactivity of list- level JOLs may be too subtle to be captured by free recall in the current experiment, but it could be picked up by cued recall in Stevens and Pierce’s (2019) experiment. Of course, other factors may also come into play, such as differences in study materials and sample characteristics, which will need to be determined by further replications. On a related note, another controversy arises from the different findings for reactivity of the list-level JOLs between Experiments 3 and 4. Given that list-level JOLs were found to 81 improve free recall for target-target related pairs in Experiment 3, it was quite surprising that the free recall benefits evaporated when study materials were changed from word pairs to word lists. In that connection, it is noteworthy that word pairs should primarily invite participants to process the relation between cue and target within each pair, whereas blocked categorical word lists should primarily provoke participants to focus on the relations among individual words. Therefore, in Experiment 3, list-level JOLs encouraged participants to process the target-target relatedness among word pairs, which were cues that were not prioritized by the word pairs per se. However, in Experiment 4, list-level JOLs should produce less improvement in relational processing because such processing is already strongly encouraged by blocked categorical lists themselves. In other words, list-level JOLs stimulate complementary processing with target- target word pairs but not with blocked categorical lists, which may explain why list-level JOLs produced positive reactivity in the former situation but not in the latter. 82 CHAPTER 6 GENERAL DISCUSSION In the present dissertation, I examined the underlying mechanism of JOL reactivity by (a) testing the predictions of major theoretical hypotheses about JOL reactivity and by (b) identifying which retrieval processes were modified by the solicitation of JOLs. To achieve these aims, I pitted the two leading theoretical accounts, the changed-goal hypothesis (Mitchum et al., 2016) and the cue-strengthening hypothesis (Soderstrom et al., 2015), against each other in Experiments 1 and 2 and tested further predictions of the cue-strengthening hypothesis in Experiments 3. In Experiment 4, I tested the recently proposed hypothesis that positive reactivity of item-level JOLs arises from enhanced item-specific processing (Senkova & Otani, 2021). Moreover, I implemented the dual-retrieval model to estimate underlying retrieval processes and tested which processes were significantly different between the conditions with JOLs and the condition without JOLs. Below, I first present a brief review of the experimental design, theoretical predictions, and behavioral findings in each of the four experiments. It turned out that the first three experiments offered preferential support for the cue-strengthening hypothesis rather than the changed-goal hypothesis, while Experiment 4 provided counterevidence for the item-specific hypothesis. Then, I address what the model analyses across the four experiments reveal about the process-level mechanism for JOL reactivity. Last, I discuss the theoretical implications of those findings and the recommendations for future research. Summary of Main Methodologies, Hypotheses, and Behavioral Findings In Experiment 1, I compared the reactive effects of JOLs between strongly related, weakly related, and identical word pairs by having participants either make item-level JOLs or 83 make no JOLs for a mixed list of the three types of word pairs. Here, the changed-goal hypothesis assumes that making JOLs highlights the differences in learning difficulty among items and prompts participants to focus more on learning the least and moderately challenging items at the cost of the most difficult items. Thus, it predicts negative reactivity for weakly related pairs and positive reactivity for strongly related and identical pairs. However, the cue- strengthening hypothesis predicts positive reactivity for all three types of word pairs, because it assumes that making JOLs can enhance the processing of cues that inform JOLs (i.e., cue-target relation and cue-target identity in this scenario) and positive reactivity should arise if the strengthened cues are useful in subsequent memory tests. The results of Experiment 1 showed that associative recall was better in the item-JOL condition than in the no-JOL condition, and the effect of JOL condition was not moderated by word pair type. In other words, JOLs improved recall performance for strongly related, weakly related, and identical word pairs to a similar extent. Obviously, this result is in line with the cue-strengthening hypothesis but not the changed-goal hypothesis. Experiment 2 was meant to test the contrasting predictions about the reactive effect of prestudy JOLs between the changed-goal and cue-strengthening hypotheses. Unlike immediate JOLs, which are made after studying each item, prestudy JOLs are made before studying each item but with specific information provided for the coming item. In Experiment 2, participants were told whether they were going to study a related or unrelated pair when making prestudy JOLs. Like immediate JOLs, prestudy JOLs were significantly higher for related pairs than for unrelated pairs (Mueller et al., 2013, 2016), suggesting that participants were aware that related pairs are more memorable than unrelated pairs when making prestudy JOLs. Thus, the changed- goal hypothesis predicts similar reactivity between prestudy JOLs and immediate JOLs. That is, 84 making prestudy JOLs should similarly change participants’ learning goals and motivate them to focus more on learning related pairs at the expense of unrelated pairs, which ultimately produce negative reactivity for unrelated pairs but positive reactivity for related pairs. However, the cue-strengthening hypothesis predicts either no or very weak reactivity of prestudy JOLs. The reason is that prestudy JOLs were formed based on very limited intrinsic and extrinsic cues. As an illustration, in Experiment 2, participants received a homogeneous prompt for all related pairs that “you are going to study a related pair”, which provided no item- specific cues that can help them to recall the particular target for a given cue on the later test. Moreover, participants may use various encoding strategies when studying the word pairs, such as interactive imagery (Wilton, 2006) or verbal elaboration (Jensen & Rohwer, 1963). Although participants might have a global sense of what strategies they would use when making prestudy JOLs, the strategy implemented for a specific pair would not be accessible until the pair was encoded. Even worse, prestudy JOLs could not possibly be based on mnemonic cues, because those cues are embedded in the encoding experience itself, such as feeling of fluency or familiarity. Thus, the cue-strengthening hypothesis predicts that because prestudy JOLs are made based on fewer diagnostic cues, prestudy JOLs should have either no reactivity or much weaker reactivity than immediate JOLs. Again, my results were consistent with the cue- strengthening hypothesis instead of the changed-goal hypothesis, as prestudy JOLs produced no reactivity for both related and unrelated pairs while immediate JOLs produced positive reactivity for related pairs but no reactivity for unrelated pairs. Experiment 3 targeted the cue-strengthening hypothesis. Unlike Experiment 1 and 2, in Experiment 3, semantic relation was not manipulated within pairs but between pairs. Namely, there was no relatedness between cue and target within each pair, but there was either categorical 85 relation or no relation between the target words among consecutive pairs. Meanwhile, I solicited both item-level JOLs and list-level JOLs in comparison to the no-JOL control condition. The former was made after each word pair, but the latter was made after each list of four word pairs. Last, I administered either associative or free recall tests. Based on the cue-strengthening hypothesis, JOL reactivity only arises when (a) JOLs are capable of strengthening the cues embedded in the study materials, and (b) when the final test is sensitive to the cues that are strengthened by JOLs. Accordingly, item-level JOLs should produce negative or little-to-no reactivity for target-target related pairs in free recall because they primarily enhance the processing of within-pair relation rather than inter-pair relation, whereas free recall is more sensitive to the latter than the former (Mitchum et al., 2016; Myers et al., 2020). Item-level JOLs should also produce little-to-no reactivity for target-target related pairs in associative recall, as there was no cue-target relatedness, which associative recall favors, in those pairs. However, list-level JOLs should produce positive reactivity for target-target related pairs in free recall but not in associative recall, as they primarily strengthen processing of target-target relatedness, while only free recall is sensitive to such cues. Again, my results are consistent with these predictions: List-level JOLs improved free recall but not associative recall for target-target related pairs, and item-level JOLs had no effects on either associative or free recall for target- target related pairs. In Experiment 4, I tested the predictions of a recently proposed item-specific hypothesis (Senkova & Otani, 2021) and revisited the contradictory finding of Senkova and Otani (2021) versus Stevens and Pierce (2019). On the one hand, Senkova and Otani (2021) found positive reactivity of item-level JOLs on recall for randomized categorical lists. On the other hand, Stevens and Pierce (2019) found no reactivity of item-level JOL on recall for blocked categorical 86 lists, but they found positive reactivity of list-level JOLs. In Experiment 4, I again administered the three JOL conditions used in Experiment 3: item-JOL, list-JOL, and no-JOL conditions. Meanwhile, I used categorical lists as study materials and presented them in either a randomized format or a blocked format. If reactivity of item-level JOLs results from the enhanced item- specific processing, which is complementary to the relational processing naturally provoked by categorical lists, item-level JOLs should improve recall for both randomized and blocked categorical lists. However, Experiment 4 replicated both Senkova and Otani’s and Stevens and Pierce’s findings in that positive reactivity of item-level JOLs occurred only for randomized but not for blocked categorical lists, which does not agree with the prediction of the item-specific hypothesis. Additionally, I did not replicate Stevens and Pierce’s finding that list-level JOLs improve recall for blocked categorical lists, which is possibly due to the differences in the memory test format, since I used free recall tests whereas Stevens and Pierce used cued recall tests (i.e., free recall tests with categorical labels presented as test cues). A summary of the experiment designs and main findings for recall in Experiments 1-4 were presented in Table 6.1. For a summary of the main theoretical predictions in Experiments 1-4, readers can refer back to Table 1.3. A comparison between Table 1.3 and Table 6.1 reveals that Experiments 1, 2, and 3 provide converging support for the cue-strengthening hypothesis rather than for the changed-goal hypothesis. Additionally, Experiment 4 did not lend support to the item-specific hypothesis. 87 Table 6.1 A Summary of the Experiment Designs and Recall Findings for Experiments 1-4 Exps Experiment Design Main Recall Findings 1 3 (Pair type: weakly related, Identical pair: Item-JOL > No-JOL strongly related, identical)  2 (JOL Strong pair: Item-JOL > No-JOL condition: item-JOL, no-JOL). Weak pair: Item-JOL > No-JOL. 2 2 (Pair type: related, unrelated)  3 Related pairs: Immediate-JOL > Prestudy-JOL (JOL condition: prestudy-JOL, = No-JOL immediate-JOL, no-JOL) Unrelated pairs: Immediate-JOL = Prestudy- JOL = No-JOL 3 2 (Target-target relation: related, Target-target related pairs: unrelated)  3 (JOL condition: - Free recall: List-JOL > Item-JOL = No- item-JOL, list-JOL, no-JOL) JOL - Associative recall: List-JOL = Item-JOL = No-JOL Target-target unrelated pairs: - Free recall: List-JOL = Item-JOL = No- JOL - Associative recall: List-JOL = Item-JOL = No-JOL 4 2 (List organization: blocked, Blocked categorical lists: Item-JOL = List-JOL randomized)  3 (JOL condition: = No-JOL item-JOL, list-JOL, no-JOL) Randomized categorical lists: Item-JOL > List- JOL = No-JOL Note. Exps = Experiments. JOL condition was manipulated between subjects throughout all four experiments. The other variables were manipulated within subjects except for list organization in Experiment 4. Item-JOL, immediate-JOL, prestudy-JOL, list-JOL, no-JOL all refer to the corresponding JOL conditions. “=” means statistically equivalent recall (i.e., no reactivity), and “>” means significantly better recall (i.e., positive reactivity). Process-Level Mechanisms for JOL Reactivity The implementation of the dual-retrieval model allowed me to determine which retrieval processes are responsible for JOL reactivity. As a reminder, the dual-retrieval model delivers estimates for two clusters of retrieval parameters: One is concerned with verbatim-based recollection, including the direct access (D) parameters and the forgetting (F) parameters. The 88 former represents the probability that the prior presentation of a specific item is vividly reinstated in mind so verbatim details of the item can be directly accessed. The latter represents the probability of losing direct access to verbatim details due to forgetting after the first recall test. The other cluster is concerned with gist-based non-recollective operations, which includes the reconstructive (R) parameter and the familiarity judgment (J) parameters. The former represents the probability of reconstructing an item based on partial information when recollection is not possible, and the latter represents the probability that the reconstructed item passes a familiarity threshold and is successfully outputted. A summary of the dual-retrieval model results for Experiments 1-4 is presented in Table 6.2. An inspection of Table 6.2 reveals that there are both commonalities and differences in the dual-retrieval model results among the four experiments. On the one hand, despite changes in the study materials, JOL type, or test format across the four experiments, positive JOL reactivity was always accompanied by increases in the D parameter. In Experiment 1, positive JOL reactivity was accompanied by an increase in the D parameter as well as a reduction in the F parameter for strongly related, weakly related, and identical word pairs. Additionally, there was an increase in the J parameters for weakly related pairs and an increase in the R parameter for identical pairs. In Experiment 2, immediate JOLs produced positive reactivity for related pairs, which was located specifically in the D parameter. In Experiment 3, list-level JOLs produced positive reactivity for target-target related pairs in free recall, which was the product of increases in the D, R, and J parameters. In Experiment 4, enhancements in the D and J parameters jointly contributed to the positive reactivity of item-level JOLs for randomized categorical lists. To sum up, it seems that enhanced direct access to verbatim details is a stable contributor to positive JOL reactivity. Namely, when the solicitation of JOLs boosted subsequent memory performance, it 89 was always tied to participants being better at recollecting the surface details of the prior presentation of studied items. Table 6.2 A Summary of the Dual-Retrieval Model Findings for Experiments 1-4 Exps Main Dual-Retrieval Model Findings 1 Identical pairs Weakly related pairs Strongly related pairs D: Item-JOL > No-JOL D: Item-JOL > No-JOL D: Item-JOL > No-JOL F: Item-JOL < No-JOL F: Item-JOL < No-JOL F: Item-JOL < No-JOL R: Item-JOL > No-JOL J3: Item-JOL > No-JOL 2 Related pairs Unrelated pairs D: Immediate-JOL > Prestudy-JOL = No- D: Immediate-JOL = No-JOL > Prestudy- JOL JOL F: Prestudy-JOL > Immediate-JOL = No- F: Prestudy-JOL > Immediate-JOL = No- JOL JOL 3 Target-target related pairs Target-target unrelated pairs Associative recall: Associative recall: F: Item-JOL > List-JOL D: Item-JOL > List-JOL = No-JOL J2: Item-JOL > No-JOL = List-JOL F: Item-JOL > List-JOL J3: Item-JOL > No-JOL = List-JOL R: No-JOL > List-JOL > Item-JOL J1: No-JOL > List-JOL Free recall: J2: Item-JOL = No-JOL > List-JOL D: List-JOL > No-JOL > Item-JOL J3: Item-JOL = No-JOL > List-JOL R: List-JOL = Item-JOL > No-JOL J2: Item-JOL > List-JOL > No-JOL Free recall: J3: Item-JOL = List-JOL > No-JOL Null finding 4 Blocked categorical lists Randomized categorical lists F: List-JOL > Item-JOL = No-JOL D: Item-JOL ≥ No-JOL > List-JOL J2: List-JOL > Item-JOL J1: List-JOL > Item-JOL > No-JOL Note. Exps = Experiments. Item-JOL, immediate-JOL, prestudy-JOL, list-JOL, no-JOL all refer to the corresponding JOL conditions. D = direct access parameter; F = forgetting parameter; J1 = familiarity judgment parameter for test 1; J2 = familiarity judgment parameter for test 2; J3 = familiarity judgment parameter for test 3; R = reconstruction parameter. “=” means statistically equivalent, “>” means significantly higher, “≥” means marginally higher, and “<” means significantly lower. Parameters whose variations accompanied positive JOL reactivity at the behavior level were highlighted with boldface fonts. 90 On the other hand, there was much evidence suggesting that the process-level patterns for JOL reactivity varied with the material type, JOL type, and test format. To illustrate, in terms of material type, it can be seen that immediate JOLs only enhanced the D parameter for related pairs but not for unrelated pairs in Experiment 2, and list-level JOLs only enhanced the D parameter for target-target related pairs but not for target-target unrelated pairs in the free recall tests of Experiment 3. Moreover, when there was salient semantic information to process within each study item (such as the strongly related pairs in Experiment 1 and the related pairs in Experiment 2), positive JOL reactivity was consistently located on the verbatim-based recollective parameter (the D and/or F parameters). However, when there was salient semantic relation between individual study items (such as the target-target related pairs in Experiment 3 and the randomized categorical lists in Experiment 4), positive reactivity was located in both verbatim-based recollective parameters and gist-based non-recollective parameters (the R and/or J parameters). In terms of JOL type, there were quite different process-level patterns between immediate and prestudy JOLs and between item-level and list-level JOLs. In Experiment 2, for related word pairs, immediate JOLs improved the D parameter and did not affect the F parameter, whereas prestudy JOLs did not affect the D parameter but increased the F parameters. Meanwhile, immediate JOLs had no effects on the D or F parameter for unrelated word pairs, but prestudy JOLs impaired the D parameter and increased the F parameter. In Experiment 3, list- level JOLs enhanced the D parameter for target-target related pairs in free recall, but item-level JOLs impaired it. However, in Experiment 4, list-level JOLs impaired the D parameters for randomized categorical lists in free recall, while item-level JOLs improved it. Thus, with the 91 same materials and test format, different types of JOLs produced different effects on the underlying retrieval processes. In terms of test format, in Experiment 3, list-level JOLs enhanced the D, R, and J parameters for target-target related pairs in free recall, but it did not affect any parameter for the same materials in associative recall. On the contrary, list-level JOLs undermined the J parameter for target-target unrelated pairs in associative recall, but it had no effect on any parameter for the same materials in free recall. Hence, not only the behavioral patterns but also the underlying process-level mechanism showed that JOL reactivity varied as a function of test format. In summary, the dual-retrieval models revealed that positive JOL reactivity was consistently accompanied by an increase in the D parameter, showing that the memory benefits of JOLs are partially attributed to the better recollection of item-specific verbatim details. Meanwhile, the effects of JOLs on underlying retrieval processes varied with the material type, JOL type, and test format, suggesting that JOL reactivity is flexible in adapting to the specific learning situation. Here, although the D parameter results support Senkova and Otani’s (2021) item-specific hypothesis, the latter findings suggest that this hypothesis cannot fully account for JOL reactivity. Specifically, the effects of JOLs on subsequent memory are not restricted to enhancing item-specific processing, although this is a consistent contributing factor. Rather, what types of cues JOLs draw upon and strengthen depends on the specific type of processing that is stimulated by the study materials and the particular types of cues that JOLs slant processing toward. Whether the strengthened cues would eventually lead to positive reactivity depends on whether they overlap with the types of cues that memory tests are sensitive to. Thus, the model findings are consistent with the cue-strengthening hypothesis, which is discussed in more detail below. 92 Theoretical Implications and Future Directions A Contextual Framework for Understanding JOL Reactivity Thus far, JOL reactivity has been established in many experiments (Dougherty et al., 2005; Janes et al., 2018; Mitchum et al., 2016; Myers et al., 2020; Rivers et al., 2021; Senkova & Otani, 2021; Soderstrom et al., 2015; Tauber & Witherby, 2019; Tekin & Roediger, 2020; Witherby & Tauber, 2017b; Yang et al., 2015; Zechmeister & Shaughnessy, 1980; Zhao et al., 2021). Still, some other experiments failed to find the effect (Ariel et al., 2021; Benjamin et al., 1998; Dougherty et al., 2018; Kelemen & Weaver III, 1997; Kornell & Bjork, 2008; Tauber & Rhodes, 2012). Therefore, an obvious theoretical goal is to develop a coherent explanation of JOL reactivity that specifies when it will be present and when it will be absent. In that connection, it would be beneficial to adopt a contextual framework for understanding JOL reactivity. In Jenkins's (1979) tetrahedral model of memory experiments, memory performance is considered as a contextual phenomenon based on four clusters of variables: subject characteristics (e.g., ability, interest), encoding tasks (e.g., directions or instructions provided at encoding), study materials (e.g., type of to-be-remember materials), and criterial tests (e.g., recall, recognition; see also McDaniel & Butler, 2011; Roediger, 2008). Additionally, the tetrahedral model assumes that these variables interact with each other. Specifically, the model envisions the four clusters of variables as four corners of a tetrahedron. Thus, an edge between two corners represents a two-way interaction between the two variables, and a face of the tetrahedron represents a three-way interaction among three variables. It should be noted that the cue-strengthening hypothesis highlights the interactions between three dimensions in the tetrahedral model: study material, encoding task (i.e., JOL), and criterial test, as the hypothesis assumes that JOL reactivity only occurs when JOLs strengthen the 93 cues embedded in the study materials, and the subsequent memory tests are sensitive to the strengthened cues. Myers et al.’s (2020) experiments featured the interaction between two dimensions in the tetrahedral model: study materials and criterial test. Based on their finding that item-level JOLs enhanced associative recall but not free recall for related pairs, they made an inference that is consistent with the cue-strengthening hypothesis: “the direction and strength of JOL reactivity depend on both the study material and type of final test” (p. 755). The behavioral and model findings in the current experiments were consistent with Myers et al.’s notion, as I discussed in the last section. Moreover, in Experiments 2 and 3, my results showed that reactive effects are very different for prestudy and immediate JOLs and for item- level and list-level JOLs. Thus, my results extended Myers et al.’s inference in that JOL reactivity also depended on the third factor: JOL type, which fits into the “encoding task” dimension in the tetrahedral model. Therefore, consistent with the cue-strengthening hypothesis, a three-way interaction between study materials, encoding task, and criterial test was detected, which indicates that the overlap between the cues that are embedded in the study materials, the cues that inform JOLs, and the cues that are used in the final memory test serve as a key determinant of JOL reactivity. It can be seen that the tetrahedral model is a promising framework for extending the cue- strengthening hypothesis and for further understanding JOL reactivity. Future research should benefit from exploring the other dimensions in the tetrahedral model as well as the interactions among those dimensions. For instance, future studies can investigate how subject characteristics affect JOL reactivity and how those variables interact with variables in other dimensions. In that regard, some studies have investigated the developmental trend in JOL reactivity. Tauber and Witherby (2019) found that although there was consistent positive JOL reactivity on younger 94 adults’ associative recall for related word pairs, older adults’ recall was not affected by making JOLs. Meanwhile, Zhao et al. (2021) found that making JOLs enhanced younger and older children’s recognition for word lists just as for young adults, and the magnitude of positive JOL reactivity increased with age. Therefore, JOL reactivity seems to vary as a function of age. However, Zhao et al. manipulated JOL solicitation within participants, whereas Tauber and Witherby manipulated it between subjects. Thus, it would be worth examining whether there would be positive JOL reactivity in older adults when the solicitation of JOLs is manipulated within subjects. Interestingly, Zhao et al. (2021) also reported considerable individual differences in JOL reactivity. When they decomposed the reactive effects of JOLs at the individual level, they found that although the majority of children experienced positive reactivity, there was a substantial proportion who did not. Thus, it would be interesting to incorporate individual-level analyses in future JOL reactivity research and investigate what factors predict individual differences in JOL reactivity. By the same token, it will also be beneficial to examine JOL reactivity with other types of study materials. In the current study, I used relatively simple study materials such as word pairs and word lists. Thus, one recommendation for future research is to examine JOL reactivity with more complex study materials, such as pictures, sentences, or text materials. In terms of JOL reactivity with pictures, the only study I am aware of is Sommer et al. (1995), who reported that making JOLs improved recognition for face images. In terms of text materials, Ariel et al. (2021) recently reported that making aggregate or term-specific JOLs for a piece of science text did not enhance participants’ performance on later short-answer questions unless overt retrieval was prompted before JOLs. Here, aggregate JOLs were solicited after reading a complete piece of science text (e.g., how confident are you that you understand the text), while term-specific 95 JOLs were solicited after reading a subsection of the science text that is specifically devoted to a single concept (e.g., “how confident are you that you understand how minerals are made”; p. 700). Why did JOLs produce robust positive reactivity for related word pairs and word lists, but not for longer text materials? Ariel et al. proposed that it is because JOLs prompt different retrieval dynamics with the former than the latter. Namely, with more complex materials such as science text, making JOLs may elicit less effortful retrieval and earlier termination than for simpler materials such as word pairs and word lists. In support of the retrieval dynamic proposal, Ariel et al. found that if participants were asked short-answer questions about the text as a retrieval prompt before making a JOL, positive JOL reactivity emerged again. It is also worth noting that JOLs were studied at the global level with text materials, as they were solicited after at least a cluster of words. On a related note, the present dissertation showed that list-level JOLs, which are also a type of global-level JOLs, did not display stable reactivity: They produced positive reactivity on free recall for target-target related pairs in Experiment 3 but not for blocked categorical lists in Experiment 4. Thus, it is possible that reactivity of JOLs solicited at the global level was, in general, less robust than that of JOLs solicited at the local level. In brief, the causes of the volatility of global-level JOLs for complex materials remain an open question and merits further investigation. In sum, it is recommended that future JOL reactivity research adopt a contextual framework that is based on the tetrahedral model of memory experiments, which serves as a scaffolding for extending the cue-strengthening hypothesis. Accordingly, researchers are encouraged to explore the less studied dimensions and the interactions among different 96 dimensions in the tetrahedral model, such as investigating how JOL reactivity varies with individual difference variables and the complexity of study materials. Implications for Research on Other Encoding Tasks In Chapter 1, I have discussed that the literature on various encoding tasks provides critical implications for JOL reactivity. In turn, the present findings on JOL reactivity also provide implications for research on the other encoding tasks. First and most obvious, researchers should consider JOLs as an independent encoding task rather than a pure metamemory measurement. As evident in Soderstrom et al. (2015) and Tekin and Roediger (2020), the solicitation of JOLs substantially attenuated the generation effect and the depth-of- processing effect. Namely, making JOLs decreased the memory difference between generated versus read items and between deeply processed versus shallowly processed items. Thus, when both JOLs and another encoding task are implemented in an experiment design, researchers should always consider the possibility that the memory effects of the encoding task of interest may be moderated by JOLs. Second, the overlaps in the surface format, memory effects, and theoretical explanations between JOLs and other encoding tasks suggest they may be considered in a unified theoretical framework. JOLs and other common encoding tasks, such as deep processing or survival processing, all require participants to make judgments about study materials during encoding, which then produced reliable effects on memory performance. Meanwhile, the cue- strengthening hypothesis partly builds upon the transfer-appropriate multifactor account of the generation effect, which indicates that the effects of both JOLs and generation are assumed to be subject to the principle of transfer-appropriateness. Considering the resemblance between JOLs and the other common encoding tasks, the research on other encoding tasks should also benefit 97 from adopting a contextual framework that emphasizes the interactions among subject characteristics, study materials, encoding tasks, and criterial tests. Taking the deep processing task as an illustration, evidence has shown that the level-of- processing effect was constrained by test format and material type. Morris et al. (1977) showed that although deep processing task (semantic-oriented) produced superior memory compared to shallow processing task (phonetic-oriented) with a standard recognition test, the pattern was reversed with a rhyming recognition test. On the rhyming recognition test, the to-be- remembered words were not the original study words but words that rhymed with the study words. In this case, Morris et al. found that the shallow processing task led to better performance than the deep processing task, suggesting that the depth-of-processing effect depends on the match between encoding task and test format. Moreover, deep processing was demonstrated to increase recall of critical distractors (e.g., sleep) for semantic Deese-Roediger-McDermott (DRM) lists (e.g., a list of words that are forward associates of “sleep”, such as bed, doze, awake, nap, yawn, …) compared to shallowing processing (Thapar & McDermott, 2001; Toglia et al., 1999). However, Chan et al. (2005) found that shallowing processing produced higher recall of critical distractors for phonological DRM lists (e.g., a list of words that sound like “sleep”, such as sweep, steep, sleet, slop, heap, …) than deep processing. This indicates that the depth-of-processing effect was also subject to the interaction between encoding task and material type. Similarly, the survival processing effect has been found to vary as a function of material type and test format. Butler et al. (2009) used three types of word lists: a list that is relevant to a grassland survival scenario, a list that is relevant to a bank robbery scenario, and a list that is irrelevant to both scenarios. When participants studied the three word lists, they were asked to 98 rate the words’ relevance to either the grassland survival or the bank robbery scenario. Here, Butler et al. found an interaction between list type and rating instruction: The recall performance for irrelevant lists was comparable between the two rating conditions, whereas survival rating led to better recall for survival-relevant lists, and robbery rating led to better recall for robbery- related lists. This suggests that the memory benefits of survival processing rely on the congruity between the content of study materials and the encoding task (but see Nairne & Pandeirada, 2011). Meanwhile, Broder et al. (2011) also found that the memory benefits of survival processing did not extend from item memory test to source memory test, demonstrating that test format is a boundary condition for the survival processing effect. To sum up, researchers should always consider the likelihood of interactions between JOLs and other encoding tasks, because JOLs are themselves an independent encoding task. More important, it can be seen that JOLs shared a close resemblance with common encoding tasks such as deep processing and survival processing. In addition to the surface similarity that they all solicit judgments about study materials during encoding, their memory effects all vary with material type, encoding task, and test format. Thus, it is recommended that researchers should adopt a contextual framework to investigate the memory effects of both JOLs and other similar encoding tasks. Questions That Remain to Be Answered One important question that merits further consideration for the cue-strengthening hypothesis is under what conditions JOLs strengthen diagnostic cues. Recall that the cue- strengthening hypothesis is formed based on the combination of the cue-utilization framework for JOLs (Koriat, 1997) and the transfer-appropriate multifactor account for generation effect (de Winstanley et al., 1996). Here, the cue-utilization framework stresses that JOLs are made based 99 on three types of cues: intrinsic, extrinsic, and mnemonic cues. The transfer-appropriate multifactor account posits that “the act of generation strengthens whatever type of information is used by the learner to complete the generation task” (p. 554; Soderstrom et al., 2015). Therefore, the cue-strengthening hypothesis seems to assume that JOLs would strengthen whatever cues that are used in forming the JOLs. However, my results suggest that this is not necessarily the case. For instance, in Experiment 3, item-level JOLs were higher for target-target related pairs than for target-target unrelated pairs, suggesting that inter-pair relations were processed when making item-level JOLs. However, there was no reactivity for item-level JOLs even when the later free recall test was sensitive to inter-pair relations, which suggests that item-level JOLs did not strengthen processing for inter-pair relations, at least not in a statistically detectable manner. Such results cast doubt on the assumption that whatever cues that are used in JOLs would be strengthened by the act of making JOLs. In that connection, Soderstrom et al. (2015) suggested that cue salience may be a precondition for cue strengthening. That is, only cues that are easily discernable (e.g., cue-target relation in strongly related pairs) would be efficiently used in forming JOLs and thus be strengthened, which explains why reactivity was much weaker for weakly related or unrelated pairs. However, such an account would have difficulty explaining why positive reactivity arises for weakly related pairs in Experiment 1 of the present dissertation and in Tauber and Witherby (2019; Experiments 3, 4, & 5). Notably, the magnitude of positive reactivity was comparable between strongly related and weakly related pairs in the former, and it was even numerically larger for weakly related than for strongly related pairs in the latter. Therefore, it would require further research to specify what constitutes the exact conditions for JOLs to strengthen the cues embedded in study materials. 100 Another question that remains to be answered is whether JOL reactivity stems from incidental improvement in learning processes or intentional, strategic responses to the demand for self-assessment (Double et al., 2018). As Double et al. discussed, although JOLs are only intended to tap retrospective evaluation for items that are just studied, they can also prime prospective evaluation for to-be-studied items as they are repeatedly solicited throughout the learning process. Thus, it is possible that a JOL made for the prior item can provide feedback for adjusting study strategies for the next items. However, my results in Experiment 2 did not support this speculation, as prestudy JOLs, which should prompt prospective evaluation and motivate participants to update their learning strategies, did not induce positive reactivity. Nevertheless, this result does not necessarily rule out the possibility that strategic responses are involved in JOL reactivity. To further investigate this issue, self-report measures about metacognitive experiences could be administered (e.g., Mitchum et al., 2016; Rivers et al., 2021). For example, researchers can ask participants to report whether they engage in consciously different processing strategies between items that are followed by a JOL versus those that are not followed by a JOL. The last question is whether JOL reactivity varies as a function of JOL accuracy. That is, if people make more accurate JOLs, would JOLs directly enhance memory to a larger extent? Here, Double (2019) examined a mirrored version of the question: Whether JOLs impair memory when JOLs were less accurate. He manipulated the font sizes of a pure list of related pairs and a pure list of unrelated pairs. Font size has been widely studied as a cue that induces a dissociation between JOLs and actual memory, where larger font sizes reliably increase JOLs but do not necessarily improve memory (Chang & Brainerd, 2022). Thus, Double expected that when people’s attention was captured by font size, which is a salient but not diagnostic cue, they 101 may base JOLs primarily on font size rather than other less salient but diagnostic cues. Consequently, making JOLs strengthened the processing of uninformative cues, which should thus impair rather than benefit future recall. His findings were consistent with his hypothesis. However, such findings still wait to be replicated. Meanwhile, it would be informative to investigate whether improving JOL accuracy, such as by providing metacognitive training, would enhance JOL reactivity. Concluding Comments The behavioral findings in the present dissertation were more consistent with the predictions of the cue-strengthening hypothesis than of the changed-goal hypothesis, thus offering preferential support for the former. Moreover, the dual-retrieval model results demonstrated that although an enhanced recollection was a hallmark of positive JOL reactivity, it was not the sole component, as JOLs also enhanced non-recollective operations when interitem relations rather than item-specific cues were featured in study materials. Further, the process- level pattern of JOL reactivity depended heavily on the overlap in cues between study materials, JOL tasks, and memory tests. Thus, it is recommended that future studies adopt a contextual framework for understanding JOL reactivity. 102 Reference Arbuckle, T. Y., & Cuddy, L. L. (1969). Discrimination of item strength at time of presentation. Journal of Experimental Psychology, 81(1), 126–131. https://doi.org/10.1037/h0027455 Ariel, R., Dunlosky, J., & Bailey, H. (2009). Agenda-based regulation of study-time allocation: When agendas override item-based monitoring. Journal of Experimental Psychology: General, 138(3), 432–447. https://doi.org/10.1037/a0015928 Ariel, R., Karpicke, J. D., Witherby, A. E., & Tauber, S. K. (2021). Do judgments of learning directly enhance learning of educational materials? Educational Psychology Review, 33(2), 693–712. https://doi.org/10.1007/s10648-020-09556-8 Benjamin, A. S., Bjork, R., & Schwartz, B. (1998). The mismeasure of memory: When retrieval fluency is misleading as a metamnemonic index. Journal of Experimental Psychology: General, 127(1), 55–68. https://doi.org/10.1037//0096-3445.127.1.55 Besken, M., & Mulligan, N. W. (2013). Easily perceived, easily remembered? Perceptual interference produces a double dissociation between metamemory and memory performance. Memory & Cognition, 41(6), 897–903. https://doi.org/10.3758/s13421-013- 0307-8 Bowen, H. J., Gallant, S. N., & Moon, D. H. (2020). Influence of reward motivation on directed forgetting in younger and older adults. Frontiers in Psychology, 11, 1764. https://doi.org/10.3389/fpsyg.2020.01764 Bower, G. H., Martin, & Karlin, B. (1974). Depth of processing pictures of faces and recognition memory. Journal of Experimental Psychology, 103(4), 751–757. https://doi.org/10.1037/h0037190 103 Brainerd, C. J., & Reyna, V. F. (1998). Fuzzy-trace theory and children’s false memories. Journal of Experimental Child Psychology, 71(2), 81–129. https://doi.org/10.1006/jecp.1998.2464 Brainerd, C. J., & Reyna, V. F. (2010). Recollective and nonrecollective recall. Journal of Memory and Language, 63(3), 425–445. https://doi.org/10.1016/j.jml.2010.05.002 Brainerd, C. J., Reyna, V. F., & Howe, M. L. (2009). Trichotomous processes in early memory development, aging, and neurocognitive impairment: A unified theory. Psychological Review, 116(4), 783–832. https://doi.org/10.1037/a0016963 Brainerd, C. J., Wright, R., Reyna, V. F., & Payne, D. G. (2002). Dual-retrieval processes in free and associative recall. Journal of Memory and Language, 46(1), 120–152. https://doi.org/10.1006/jmla.2001.2796 Broder, A., Krüger, N., & Schütte, S. (2011). The survival processing memory effect should generalise to source memory, but It doesn’t. Psychology, 2(9), 896-901. https://doi.org/10.4236/psych.2011.29135 Butler, A. C., Kang, S. H. K., & Roediger, H. L. (2009). Congruity effects between materials and processing tasks in the survival processing paradigm. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(6), 1477–1486. https://doi.org/10.1037/a0017024 Castel, A. D. (2008). Metacognition and learning about primacy and recency effects in free recall: The utilization of intrinsic and extrinsic cues when making judgments of learning. Memory & Cognition, 36(2), 429–437. https://doi.org/10.3758/MC.36.2.429 Castel, A. D., McCabe, D. P., & Roediger, H. L. (2007). Illusions of competence and overestimation of associative memory for identical items: Evidence from judgments of 104 learning. Psychonomic Bulletin & Review, 14(1), 107–111. https://doi.org/10.3758/BF03194036 Chan, J. C. K., McDermott, K. B., Watson, J. M., & Gallo, D. A. (2005). The importance of material-processing interactions in inducing false memories. Memory & Cognition, 33(3), 389–395. https://doi.org/10.3758/BF03193057 Chang, M. (2019). Dual-retrieval models and metamemory in younger and older adults [Unpublished master’s thesis, Cornell University]. https://ecommons.cornell.edu/handle/1813/70006 Chang, M., & Brainerd, C. J. (2022). Association and dissociation between judgments of learning and memory: A Meta-analysis of the font size effect. Metacognition and Learning. https://doi.org/10.1007/s11409-021-09287-3 Coltheart, M. (1981). The MRC Psycholinguistic Database. The Quarterly Journal of Experimental Psychology Section A, 33(4), 497–505. https://doi.org/10.1080/14640748108400805 Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11(6), 671–684. https://doi.org/10.1016/S0022-5371(72)80001-X Craik, F. I. M., & Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General, 104(3), 268–294. https://doi.org/10.1037/0096-3445.104.3.268 de Winstanley, P. A., Bjork, E. L., & Bjork, R. A. (1996). Generation effects and the lack thereof: The role of transfer-appropriate processing. Memory, 4(1), 31–48. https://doi.org/10.1080/741940667 105 Double, K. S. (2019). Do judgments of learning impair recall when uninformative cues are salient? PsyArXiv. https://doi.org/10.31234/osf.io/a5bxw Double, K. S., & Birney, D. P. (2019). Reactivity to measures of metacognition. Frontiers in Psychology, 10, 2755. https://doi.org/10.3389/fpsyg.2019.02755 Double, K. S., Birney, D. P., & Walker, S. A. (2018). A meta-analysis and systematic review of reactivity to judgements of learning. Memory, 26(6), 741–750. https://doi.org/10.1080/09658211.2017.1404111 Dougherty, M. R., Robey, A. M., & Buttaccio, D. (2018). Do metacognitive judgments alter memory performance beyond the benefits of retrieval practice? A comment on and replication attempt of Dougherty, Scheck, Nelson, and Narens (2005). Memory & Cognition, 46(4), 558–565. https://doi.org/10.3758/s13421-018-0791-y Dougherty, M. R., Scheck, P., Nelson, T., & Narens, L. (2005). Using the past to predict the future. Memory & Cognition, 33(6), 1096–1115. https://doi.org/10.3758/BF03193216 Dougherty, M. R., Scheck, P., & Nelson, T. O. (n.d.). Using the past to predict the future. 20. Dunlosky, J., & Ariel, R. (2011). Self-regulated learning and the allocation of study time. In B. H. Ross (Ed.), Psychology of Learning and Motivation (Vol. 54, pp. 103–140). Academic Press. https://doi.org/10.1016/B978-0-12-385527-5.00004-8 Dunlosky, J., & Hertzog, C. (1998). Training programs to improve learning in later adulthood: Helping older adults educate themselves. In D. J. Hacker, J. Dunlosky, & A. C. Graesser (Eds.), Metacognition in educational theory and practice (pp. 249–275). Lawrence Erlbaum Associates Publishers. 106 Geller, J. (2017). Would disfluency by any other name still be disfluent? Examining the boundary conditions of the disfluency effect [Doctoral dissertation, Iowa State University]. https://lib.dr.iastate.edu/etd/15520/ Gomes, C. F. A., Brainerd, C. J., Nakamura, K., & Reyna, V. F. (2014). Markovian interpretations of dual retrieval processes. Journal of Mathematical Psychology, 59, 50– 64. https://doi.org/10.1016/j.jmp.2013.07.003 Halamish, V. (2018). Can very small font size enhance memory? Memory & Cognition, 46(6), 979–993. https://doi.org/10.3758/s13421-018-0816-6 Ikeda, K., Yue, C. L., Murayama, K., & Castel, A. D. (2016). Achievement goals affect metacognitive judgments. Motivation Science, 2(4), 199–219. https://doi.org/10.1037/mot0000047 Janes, J. L., Rivers, M. L., & Dunlosky, J. (2018). The influence of making judgments of learning on memory performance: Positive, negative, or both? Psychonomic Bulletin & Review, 25(6), 2356–2364. https://doi.org/10.3758/s13423-018-1463-4 Jenkins, J. J. (1979). Four points to remember: A tetrahedral model of memory experiments. In L. S. Cermak & F. I. M. Craik (Eds.), Levels of processing in human memory (pp. 429– 446). Hillsdale, NJ: Erlbaum Associates. Jensen, A. R., & Rohwer, W. D. (1963). Verbal mediation in paired-associate and serial learning. Journal of Verbal Learning and Verbal Behavior, 1(5), 346–352. https://doi.org/10.1016/S0022-5371(63)80015-8 Karpicke, J. D. (2017). Retrieval-based learning: A decade of progress. In J. Wixted (Ed.), Cognitive psychology of memory, Vol. 2 of Learning and memory: A comprehensive 107 reference (J. H. Byrne, Series Ed., pp. 487-514). http://dx.doi.org/10.1016/B978-0-12- 809324-5.21055-9 Kelemen, W. l., & Weaver III, C. A. (1997). Enhanced metamemory at delays: Why do judgments of learning improve over time? Journal of Experimental Psychology: Learning Memory and Cognition, 23(6), 1394–1409. https://doi.org/10.1037/0278-7393.23.6.1394 King, J. F., Zechmeister, E. B., & Shaughnessy, J. J. (1980). Judgments of knowing: The influence of retrieval practice. The American Journal of Psychology, 93(2), 329–343. https://doi.org/10.2307/1422236 Koriat, A. (1997). Monitoring one’s own knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experimental Psychology: General, 126(4), 349–370. https://doi.org/10.1037/0096-3445.126.4.349 Koriat, A., & Bjork, R. A. (2005). Illusions of competence in monitoring one’s knowledge during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(2), 187–194. Kornell, N., & Bjork, R. A. (2008). Optimising self-regulated study: The benefits—and costs— of dropping flashcards. Memory, 16(2), 125–136. https://doi.org/10.1080/09658210701763899 Maxwell, N. P., & Huff, M. J. (2021). The deceptive nature of associative word pairs: The effects of associative direction on judgments of learning. Psychological Research, 85, 1757–1775. https://doi.org/10.1007/s00426-020-01342-z Mazzoni, G., & Nelson, T. O. (1995). Judgments of learning are affected by the kind of encoding in ways that cannot be attributed to the level of recall. Journal of Experimental 108 Psychology: Learning, Memory, and Cognition, 21(5), 1263–1274. https://doi.org/10.1037/0278-7393.21.5.1263 McDaniel, M. A., & Butler, A. C. (2011). A contextual framework for understanding when difficulties are desirable. In A. S. Benjamin (Ed.), Successful remembering and successful forgetting: A festschrift in honor of Robert A. Bjork. (pp. 175–198). Psychology Press. Metcalfe, J., & Finn, B. (2008). Evidence that judgments of learning are causally related to study choice. Psychonomic Bulletin & Review, 15(1), 174–179. https://doi.org/10.3758/PBR.15.1.174 Metcalfe, J., & Kornell, N. (2005). A region of proximal learning model of study time allocation. Journal of Memory and Language, 52(4), 463–477. https://doi.org/10.1016/j.jml.2004.12.001 Mitchum, A. L., Kelley, C. M., & Fox, M. C. (2016). When asking the question changes the ultimate answer: Metamemory judgments change memory. Journal of Experimental Psychology: General, 145(2), 200–219. https://doi.org/10.1037/a0039923 Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing versus transfer appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16(5), 519– 533. https://doi.org/10.1016/S0022-5371(77)80016-9 Mueller, M. L., Dunlosky, J., & Tauber, S. K. (2016). The effect of identical word pairs on people’s metamemory judgments: What are the contributions of processing fluency and beliefs about memory? Quarterly Journal of Experimental Psychology, 69(4), 781–799. https://doi.org/10.1080/17470218.2015.1058404 109 Mueller, M. L., Tauber, S. K., & Dunlosky, J. (2013). Contributions of beliefs and processing fluency to the effect of relatedness on judgments of learning. Psychonomic Bulletin & Review, 20(2), 378–384. https://doi.org/10.3758/s13423-012-0343-6 Myers, S. J., Rhodes, M. G., & Hausman, H. E. (2020). Judgments of learning (JOLs) selectively improve memory depending on the type of test. Memory & Cognition, 48(5), 745–758. https://doi.org/10.3758/s13421-020-01025-5 Nairne, J. S., & Pandeirada, J. N. S. (2011). Congruity effects in the survival processing paradigm. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(2), 539–549. https://doi.org/10.1037/a0021960 Nairne, J. S., Thompson, S. R., & Pandeirada, J. N. S. (2007). Adaptive memory: Survival processing enhances retention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33(2), 263–273. https://doi.org/10.1037/0278-7393.33.2.263 Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004). The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers, 36(3), 402–407. https://doi.org/10.3758/BF03195588 Nelson, T. O., & Narens, L. (1990). Metamemory: A theoretical framework and new findings. In G. H. Bower (Ed.), Psychology of Learning and Motivation (Vol. 26, pp. 125–173). Academic Press. https://doi.org/10.1016/S0079-7421(08)60053-5 Palan, S., & Schitter, C. (2018). Prolific.ac—A subject pool for online experiments. Journal of Behavioral and Experimental Finance, 17, 22–27. https://doi.org/10.1016/j.jbef.2017.12.004 110 Price, J., & Harrison, A. (2017). Examining what prestudy and immediate judgments of learning reveal about the bases of metamemory judgments. Journal of Memory and Language, 94, 177–194. https://doi.org/10.1016/j.jml.2016.12.003 Rhodes, M. G. (2016). Judgments of learning: Methods, data, and theory. In J. Dunlosky & S. K. Tauber (Eds.), The Oxford handbook of metamemory (pp. 65–80). Oxford University Press. Rivers, M. L., & Dunlosky, J. (2021). Are test-expectancy effects better explained by changes in encoding strategies or differential test experience? Journal of Experimental Psychology Learning Memory and Cognition, 47(2), 195–207. https://doi.org/10.1037/xlm0000949 Rivers, M. L., Janes, J. L., & Dunlosky, J. (2021). Investigating memory reactivity with a within- participant manipulation of judgments of learning: Support for the cue-strengthening hypothesis. Memory, 29(10), 1342-1353. https://doi.org/10.1080/09658211.2021.1985143 Roediger, H. L., III. (2008). Relativity of remembering: Why the laws of memory vanished. Annual Review of Psychology, 59(1), 225–254. https://doi.org/10.1146/annurev.psych.57.102904.190139 Rosner, T. M., Davis, H., & Milliken, B. (2015). Perceptual blurring and recognition memory: A desirable difficulty effect revealed. Acta Psychologica, 160, 11–22. https://doi.org/10.1016/j.actpsy.2015.06.006 Sahakyan, L., Delaney, P. F., & Kelley, C. M. (2004). Self-evaluation as a moderating factor of strategy change in directed forgetting benefits. Psychonomic Bulletin & Review, 11(1), 131–136. https://doi.org/10.3758/BF03206472 111 Schäfer, F., & Undorf, M. (2021). Positive and negative reactivity in judgments of learning: Shared or distinct mechanisms? 63rd Conference of Experimental Psychologists, Ulm, Germany. Schwenn, E. A., & Underwood, B. J. (1968). The effect of formal and associative similarity on paired-associate and free-recall learning. Journal of Verbal Learning & Verbal Behavior, 7(4), 817–824. https://doi.org/10.1016/S0022-5371(68)80147-1 Senkova, O., & Otani, H. (2021). Making judgments of learning enhances memory by inducing item-specific processing. Memory & Cognition, 49, 955–967. https://doi.org/10.3758/s13421-020-01133-2 Soderstrom, N. C., Clark, C. T., Halamish, V., & Bjork, E. L. (2015). Judgments of learning as memory modifiers. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(2), 553–558. https://doi.org/10.1037/a0038388 Soderstrom, N. C., & McCabe, D. P. (2011). The interplay between value and relatedness as bases for metacognitive monitoring and control: Evidence for agenda-based monitoring. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(5), 1236– 1242. https://doi.org/10.1037/a0023548 Sommer, W., Heinz, A., Leuthold, H., Matt, J., & Schweinberger, S. (1995). Metamemory, distinctiveness, and event-related potentials in recognition memory for faces. Memory & Cognition, 23, 1–11. https://doi.org/10.3758/BF03210552 Son, L., & Metcalfe, J. (2000). Metacognitive and control strategies in study-time allocation. Journal of Experimental Psychology. Learning, Memory, and Cognition, 26(1), 204–221. https://doi.org/10.1037/0278-7393.26.1.204 112 Stevens, A. S., & Pierce, B. H. (2019). Do reactive effects of judgments of learning extend to words lists? 2019 Annual Meeting of the Psychonomic Society, Montreal, QC, Canada. Tauber, S. K., & Rhodes, M. G. (2012). Measuring memory monitoring with judgements of retention (JORs). Quarterly Journal of Experimental Psychology, 65(7), 1376–1396. https://doi.org/10.1080/17470218.2012.656665 Tauber, S. K., & Witherby, A. E. (2019). Do judgments of learning modify older adults’ actual learning? Psychology and Aging, 34(6), 836–847. https://doi.org/10.1037/pag0000376 Tekin, E., & Roediger, H. L. (2020). Reactivity of judgments of learning in a levels-of- processing paradigm. Zeitschrift Für Psychologie, 228(4), 278–290. https://doi.org/10.1027/2151-2604/a000425 Thapar, A., & McDermott, K. B. (2001). False recall and false recognition induced by presentation of associated words: Effects of retention interval and level of processing. Memory & Cognition, 29(3), 424–432. https://doi.org/10.3758/BF03196393 Toglia, M. P., Neuschatz, J. S., & Goodwin, K. A. (1999). Recall accuracy and illusory memories: When more is less. Memory, 7(2), 233–256. https://doi.org/10.1080/741944069 Underwood, B. J., Ekstrand, B. R., & Keppel, G. (1965). An analysis of intralist similarity in verbal learning with experiments on conceptual similarity. Journal of Verbal Learning and Verbal Behavior, 4(6), 447–462. https://doi.org/10.1016/S0022-5371(65)80042-1 Undorf, M., & Bröder, A. (2020). Cue integration in metamemory judgements is strategic. Quarterly Journal of Experimental Psychology, 73(4), 629–642. https://doi.org/10.1177/1747021819882308 113 Van Overschelde, J. P., Rawson, K. A., & Dunlosky, J. (2004). Category norms: An updated and expanded version of the Battig and Montague (1969) norms. Journal of Memory and Language, 50(3), 289–335. https://doi.org/10.1016/j.jml.2003.10.003 Wilton, R. N. (2006). Interactive imagery and colour in paired-associate learning. Acta Psychologica, 121(1), 21–40. https://doi.org/10.1016/j.actpsy.2005.05.006 Witherby, A. E., & Tauber, S. K. (2017a). The concreteness effect on judgments of learning: Evaluating the contributions of fluency and beliefs. Memory & Cognition, 45(4), 639– 650. https://doi.org/10.3758/s13421-016-0681-0 Witherby, A. E., & Tauber, S. K. (2017b). The influence of judgments of learning on long-term learning and short-term performance. Journal of Applied Research in Memory and Cognition, 6(4), 496–503. https://doi.org/10.1016/j.jarmac.2017.08.004 Yang, H., Cai, Y., Liu, Q., Zhao, X., Wang, Q., Chen, C., & Xue, G. (2015). Differential neural correlates underlie judgment of learning and subsequent memory performance. Frontiers in Psychology, 6, 1699. https://doi.org/10.3389/fpsyg.2015.01699 Yu, Y., Jiang, Y., & Li, F. (2020). The effect of value on judgment of learning in tradeoff learning condition: The mediating role of study time. Metacognition and Learning, 15, 435–454. https://doi.org/10.1007/s11409-020-09234-8 Yue, C. L., Castel, A. D., & Bjork, R. A. (2013). When disfluency is—and is not—a desirable difficulty: The influence of typeface clarity on metacognitive judgments and memory. Memory & Cognition, 41(2), 229–241. https://doi.org/10.3758/s13421-012-0255-8 Zechmeister, E. B., & Shaughnessy, J. J. (1980). When you know that you know and when you think that you know but you don’t. Bulletin of the Psychonomic Society, 15(1), 41–44. https://doi.org/10.3758/BF03329756 114 Zhao, W., Li, B., Shanks, D. R., Zhao, W., Zheng, J., Hu, X., Su, N., Fan, T., Yin, Y., Luo, L., & Yang, C. (2021). When judging what you know changes what you really know: Soliciting metamemory judgments reactively enhances children’s learning. Child Development, 93, 405– 417. https://doi.org/10.1111/cdev.13689 115 Appendix A I used slightly different versions of the dual-retrieval model between Experiments 1, 2, 3 and Experiment 4 to accommodate for the methodological differences between these experiments, as word pairs and associative or free recall tests were used in Experiments 1, 2, and 3, whereas lists of single words and free recall tests were used in Experiment 4. The dual- retrieval model used in the first three experiments is described below: p(CCC) = D(1 - F) + (1 - D)RJ1J2J3 (A1) p(CCE) = (1 - D)RJ1J2(1 - J3) (A2) p(CEC) = (1 - D)RJ1(1 - J2)J3 (A3) p(CEE) = DF + (1 - D)RJ1(1 - J2)(1 - J3) (A4) p(ECC) = (1 - D)R(1 - J1)J2J3 (A5) p(ECE) = (1 - D)R(1 - J1)J2(1 - J3) (A6) p(EEC) = (1 - D)R(1 - J1)(1 - J2)J3 (A7) p(EEE) = (1 - D)R(1 - J1)(1 - J2)(1 - J3) + (1 - D)(1 - R) (A8) where D is the probability that the verbatim trace of an item’s presentation can be directly accessed on a recall test, R is the probability that an item can be reconstructed on a recall test when the verbatim trace of the item’s presentation cannot be accessed, F is the probability that the direct access works in the first recall test but fails simultaneously in both of the following two recall tests, and J1, J2 and J3 are the probabilities that a reconstructed item is judged to be familiar enough to output on test 1, test 2 and test 3, respectively. The dual-retrieval model used in the Experiment 4 was slightly modified regarding the F parameter in Equations (A1) and (A2), where p(CCC) was expressed as D(1 - F)(1 - F) + (1 - D)RJ1J2J3, and p(CCE) was expressed as D(1 - F)F + (1 - D)RJ1J2(1 - J3). It can be seen that the 116 only difference is that the previous version of the dual-retrieval model assumes that the forgetting status remains invariant between the second and third recall tests, but the current version can cover the situation that participants retained direct access in the second recall test but forgot it in the third recall test. Namely, the F parameter no longer stands for the forgetting probability in both recall tests 2 and 3. Instead, it was now defined as the probability that participants lost direct access due to forgetting in the second or the third recall test, with the assumption that the probability of forgetting was equal between the two recall tests. The likelihood function for the data predicted by the dual-retrieval model is: L6 = Π(p )N(i)i (A9) where pi is the predicted recall probabilities on the left side of all the aforementioned equations, and the N(i) is actual observed data counts. Because six parameter estimates are obtained with the model, one empirical probability is free to vary. Namely, there is one degree of freedom for L6. To estimate goodness of fits, I compared the likelihood in Equation (A9) to the likelihood of the same data when all empirical probabilities are free to vary. The goodness-of-fit test is: G2 = -2ln[L6 ⁄L7] (A10) where L6 is the likelihood of the data predicted by the dual-retrieval model, and L7 is the likelihood of the same data when all empirical probabilities are free to vary. G2 has a similar asymptotic distribution as 2. Thus, the critical value of rejecting null hypothesis at the .05 confidence level is 3.84. 117 Appendix B Pair type Cue Target Pair type Cue Target Pair type Cue Target Strong spoon fork Weak pliers tweezers Identical ladder ladder Strong quack duck Weak cup mug Identical nuts nuts Strong crocodile alligator Weak tomb coffin Identical cafe cafe Strong porpoise dolphin Weak flea insect Identical crown crown Strong lips kiss Weak beard trim Identical toast toast Strong daisy flower Weak scalp bald Identical skirt skirt Strong gate fence Weak spade diamond Identical stamp stamp Strong jam jelly Weak tent woods Identical ham ham Strong bunny rabbit Weak handbag wallet Identical tray tray Strong grandpa grandma Weak plaster ceiling Identical caravan caravan Strong sock shoe Weak collar blouse Identical basket basket Strong pull push Weak stove pipe Identical battery battery Strong anchor boat Weak trash bag Identical pyramid pyramid Strong salad lettuce Weak van bus Identical wedding wedding Strong jigsaw puzzle Weak hurt cry Identical ivory ivory Strong toaster oven Weak icing chocolate Identical cigar cigar Strong hospital sick Weak cloth shirt Identical lion lion Strong lime lemon Weak animals soft Identical hay hay Strong parcel package Weak alley lane Identical string string Strong niece nephew Weak leather purse Identical fiber fiber Strong circus clown Weak barley soup Identical suburb suburb Strong mustard ketchup Weak blow balloon Identical swim swim Strong atom bomb Weak compass ruler Identical clock clock Strong squid octopus Weak alcohol vodka Identical boss boss Strong tornado hurricane Weak dancer belly Identical blonde blonde Strong salt pepper Weak cream whip Identical jungle jungle Strong cod fish Weak vein vessel Identical tooth tooth Strong nest bird Weak chaos headache Identical bat bat Strong bull cow Weak flap seal Identical mansion mansion Strong mice rat Weak gymnast tumble Identical bronze bronze Strong tractor trailer Weak runner blade Identical deaf deaf Strong verb noun Weak penguin cute Identical thicket thicket 118 Appendix C Cue-target relation Cue Target Cue-target relation Cue Target Related idiot stupid Unrelated brush coffee Related porpoise dolphin Unrelated broom dog Related wrong right Unrelated crawl bread Related shore beach Unrelated cube nurse Related officer police Unrelated envelope violin Related trash garbage Unrelated fork biology Related stem flower Unrelated kind grass Related joke laugh Unrelated minister cut Related empty full Unrelated grape pot Related bed sleep Unrelated orchestra smell Related daughter son Unrelated strand tool Related crust pie Unrelated shallow toy Related nephew niece Unrelated alcohol ghost Related cheek fat Unrelated sow liberty Related creek river Unrelated bone house Related corridor hall Unrelated butterfly beat Related album record Unrelated carbon throw Related grandpa grandma Unrelated galaxy squirrel Related swift fast Unrelated heavy garlic Related compass direction Unrelated cocktail cement Related coach team Unrelated balloon sex Related picture frame Unrelated business mosquito Related crops corn Unrelated knowledge reptile Related charm bracelet Unrelated guest banana Related ruby red Unrelated literature quack Related emerald green Unrelated tight farm Related comedian funny Unrelated acrobat verb Related reflection mirror Unrelated author clarinet Related crowd people Unrelated basement forest Related honey sweet Unrelated biscuit sad Related fright scare Unrelated contract foot Related conductor train Unrelated dandruff rage Related birth death Unrelated virus dream Related monument statue Unrelated jewel man Related easy hard Unrelated lamb meeting Related credit card Unrelated host up Related temple church Unrelated fair ski 119 Related cough cold Unrelated pet art Related yarn knit Unrelated sea goat Related shooting gun Unrelated sin kite 120 Appendix D Target-target relation Cue Target Target-target relation Cue Target Related forest uncle Unrelated quack knife Related sob aunt Unrelated quill church Related joke nephew Unrelated toe noun Related sad grandmother Unrelated boulder chair Related kite diamond Unrelated pony leg Related scissors ruby Unrelated calf apple Related stumble pearl Unrelated sail gun Related paste emerald Unrelated globe president Related filth bus Unrelated pilot house Related ink plane Unrelated comb beer Related gift boat Unrelated pine robbery Related chalk train Unrelated icing ruler Related daisy doll Unrelated pen monk Related queen ball Unrelated flood ketchup Related pigeon puzzle Unrelated slip wood Related lamp block Unrelated chapel yard Related convent steel Unrelated pupil water Related rectangle iron Unrelated tale jazz Related circus bronze Unrelated tickle corn Related toad lead Unrelated tangerine boot Related trout magazine Unrelated assist beetle Related yacht journal Unrelated profit tulip Related verb novel Unrelated jet diabetes Related vase encyclopedia Unrelated robin Christmas Related library soldier Unrelated lumber cruise Related nail private Unrelated atom salmon Related scent colonel Unrelated photo python Related spoon officer Unrelated peel wine Related wallet dog Unrelated transportation hat Related web cat Unrelated salt butterfly Related stone horse Unrelated soil guitar Related cradle lion Unrelated tall lime Related despise cotton Unrelated afraid pencil Related swift silk Unrelated yolk purse Related cathedral polyester Unrelated sock tango Related leaf wool Unrelated physician ferry Related pet blue Unrelated circle spade 121 Related umbrella red Unrelated kitten pepper Related dusk green Unrelated empty lemonade Related hammer yellow Unrelated macaroni policeman 122 Appendix E Blocked lists: Categorical label A natural earth formation A vegetable A four-footed animal A part of a building A musical instrument List words valley potato tiger office drum river squash horse stairs guitar canyon pepper rabbit lobby flute volcano lettuce giraffe ceiling piano ocean radish elephant window trumpet cliff carrot moose elevator clarinet island tomato squirrel basement violin stream cabbage raccoon floor cello Randomized lists: List label List 1 List 2 List 3 List 4 List 5 List words lettuce squash valley drum ocean river rabbit giraffe tomato tiger trumpet pepper guitar basement radish ceiling elevator flute squirrel office moose elephant carrot cabbage cello canyon piano volcano horse stream raccoon lobby island floor clarinet stairs potato violin cliff window 123