THE ROLE OF SOCIAL ENVIRONMENT IN SHAPING VOCAL COMMUNICATION SYSTEMS IN WILD SONGBIRDS A Dissertation Presented to the Faculty of the Graduate School of Cornell University In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Sara Christina Keen May 2020 © 2020 Sara Christina Keen THE ROLE OF SOCIAL ENVIRONMENT IN SHAPING VOCAL COMMUNICATION SYSTEMS IN WILD SONGBIRDS Sara Christina Keen, Ph. D. Cornell University 2020 ABSTRACT For many taxa, vocal communication is an essential means of navigating continuously changing social and ecological environments. Among passerine birds, vocal signals are critical to survival and play an important role in mate attraction, territory defense, and predator avoidance. In my dissertation, I study acoustic communication in a wild population of blue tits and great tits in order to investigate the effects of social environment on birds’ responses to acoustic cues as well as their production of acoustic signals. In Chapter 1, I demonstrate that blue tits can learn to associate a novel acoustic cue with predation risk, and that the behavioral response to this cue can be socially transmitted to naïve great tits, despite their lack of first-hand experience. This study suggests that social learning of acoustic cues can occur between species. In the second chapter, I develop an unsupervised machine learning approach that can objectively measure similarity of vocal signals. I present this technique such that it can be broadly applied for the analysis of diverse acoustic datasets. Using this approach, I iii then explore patterns of variation in great tit songs from multiple angles. In Chapter 3, I develop and test a mathematical model that describes the optimal levels of vocal similarity among neighbors under given social and ecological conditions. This model predicts that immigrant and resident birds will exhibit comparable levels of vocal similarity with neighbors, despite having different song learning opportunities in their natal environments. My empirical results agree with model predictions, showing that although immigrants more often use complex, unshared songs, they achieve high levels of vocal similarity with neighbors by using larger repertoires than residents. I further explore spatial and temporal variation of great tit songs in Chapter 4, and show that an individual’s songs reflect both their immigration status and their breeding territory location. I also find that songs change between subsequent years, and that this is largely due to the appearance and disappearance of individual birds. Together, my findings suggest that individuals continuously adapt their vocal behavior to changing social environments, and that interactions with both conspecifics and heterospecifics may shape vocal communication systems. iv BIOGRAPHICAL SKETCH Sara Keen was born in the small coastal city of Melbourne, Florida. As a child, she spent much of her time outdoors and enjoyed many long afternoons exploring the woods beside her house and climbing trees. She attended college at the University of Florida, where she studied electrical engineering and was an active member of the machine intelligence laboratory and was encouraged, supported, and inspired by her dynamic undergraduate advisor, Dr. Eric Schwartz. Throughout college, she explored her interests in programming and machine intelligence by participating in an NSF REU summer program in robotics, being a teaching assistant for microprocessor classes, and leading robotics summer camps for elementary school students. Outside of the laboratory, she frequently ran, cycled, and explored the natural areas in Gainesville, Florida, and became fascinated with the natural world. After finishing her undergraduate degree, she taught middle school English in Cali, Colombia, which offered more chances to explore different ecological and social landscapes, establish longstanding friendships, and develop fluency in Spanish. Following this, she worked as a technical assistant in a robotics laboratory at Yale University, and decided to return to the University of Florida to complete a masters degree in electrical engineering. She was worked with Dr. John Harris, who introduced her to the world of digital signal processing and who also demonstrated the essential skill of carrying out rigorous research while having as much fun as possible. During this period Sara worked as an intern at the Bioacoustics Research Program at the Cornell Lab of Ornithology, and became fascinated by the world of bioacoustics. v Her experience in the bioacoustics group led Sara to take a research position at the National University of Singapore for one year, and to then complete an MA in conservation biology with Dr. Dustin Rubenstein at Columbia University. During this period, Sara had the opportunity to apply acoustic analyses to study bird behavior, and she was inspired to continue this research as a PhD student in the Department of Neurobiology and Behavior at Cornell University. Sara was advised by Dr. Kern Reeve, who supported and encouraged her as she conducted field work, made numerous trips to her field site in England, and gained invaluable insights through long conservations about science at Ithaca Coffee Company. While in Oxford, she received essential guidance and encouragement from Drs. Ben Sheldon and Ella Cole during her studies in Wytham Woods. During this time, Sara carried developed and carried out her dissertation project, developed essential research skills, and made wonderful friendships in both Ithaca and Oxford. Sara plans to start a post-doctoral position in the Geology Department at Stanford University where she will continue to combine engineering and biology to better understand the natural world. vi For my family vii ACKNOWLEDGMENTS I would like to thank everyone who supported me in completing this dissertation; it would not have been possible without their help. Foremost, I thank Kern Reeve, who was essential to all of this. Thank you for the time you took to discuss numerous concepts and models during our meetings, for shaping how I approach problems, and for demonstrating how to be a good scientist and human. Kern’s enthusiasm for ecological research and for asking big questions helped me to thrive in NBB and to persist in the challenging times as well. His energy and support were crucial in completing this project. I also thank Ben Sheldon, whose guidance and encouragement made this study possible. Ben’s curiosity about the world, his ability to identify and lead me to interesting questions, and his welcoming me to his lab group in Oxford were an incredible academic opportunity and a chance to form great friendships and collaborations. Working in Wytham Woods was a privilege and the days spent at the field station are among my favorite times during my PhD. I am extremely grateful to the many other academic mentors who helped support and advise me during this process. Mike Sheehan and Mike Webster both gave invaluable feedback which helped to improve my project and experimental design, and through our conversations both helped me to become a stronger researcher by sharing their own experiences and insights. Thank you to Holger Klinck for offering an endless supply of energy, technical expertise, recording equipment, and reassurance that it was all good. I thank him especially for not letting me return to my previous job when I was discouraged in my second year, and for beginning CCB’s weekly tradition of Friday evening gatherings. Thank you to Ella Cole for sharing her time and Wytham expertise and for helping me find my bearings in Oxford; to Keith McMahon for teaching me how to ring birds, helping me fix my mistakes during field viii work, being available for critical phone calls from the field, and not taking the mickey every chance he got; to Lucy Aplin for showing me how to work in the aviary despite being in the midst of her own projects, and for her excellent advice on conducting experiments; to Josh Firth for sharing his expertise on Wytham birds, coding, and social networks, and for being a remarkable example of excellent time management; to Karan Odom, Marcelo Aray-Salas, Russ Charif, Wendy Erb, Maria Modanu, Liz Bergen, and Hailey Scofield for their help with developing my research as well as their friendship during the last five years. I feel enormously fortunate to have found wonderful communities in both Ithaca and Oxford while leading parallel lives these last five years. Thank you to Emma Greig, Sarah Alexander, Aditi Sahasrabuddhe, Vannina Ettori, Emily D’Angelo, Prantik Mazumder, Rohini Jalan, Kieron Guinemarde, Sarah Rugheimer, Freddy Hilleman, Ash Sendall-Price, Allison Roth, Benjamin Van Doren, Dena Clink, Ana Verarhami, Bobbi Estabrook, Yu Shiu, Liz and Joe Rowland, and Peter Wrege for your friendship and inspiration throughout this process. Most of all, I would like to thank my parents, Eric, and Emily, for always encouraging me and for each being role models of how to life a good life. I feel incredibly lucky to have ended up with such a remarkable family and for the support that each of you offered during this project. Lastly, I would like to thank the agencies and organizations that helped to support this work, including Cornell Lab of Ornithology Athena Fund for enabling so many field seasons, the Center for Conservation Bioacoustics, the Cornell Lab of Ornithology and the fantastic oversight of students by Irby Lovette, Oxford’s Edward Grey Institute for Field Ornithology, Sigma Xi, and the Cornell Department of Neurobiology and Behavior Animal Research grant. ix TABLE OF CONTENTS BIOGRAPHICAL SKETCH…………………………………………………….…….v ACNOWLEDGMENTS……………………………………………………………..viii TABLE OF CONTENTS……………………………………………………………....x CHAPTER 1…………………………………………………………………………...1 LITERATURE CITED……………………………………………………………….26 CHAPTER 2………………………………………………………………………….30 LITERATURE CITED……………………………………………………………….61 CHAPTER 3………………………………………………………………………….67 LITERATURE CITED……………………………………………………………...108 CHAPTER 4………………………………………………………………………...113 LITERATURE CITED……………………………………………………………...142 APPENDIX A……………………………………………………………………….148 APPENDIX B……………………………………………………………………….151 APPENDIX C……………………………………………………………………….164 x CHAPTER 1 SOCIAL LEARNING OF ACOUSTIC ANTI-PREDATOR CUES OCCURS BETWEEN WILD BIRD SPECIES Sara C. Keen1,2, Ella F. Cole2, Michael J. Sheehan1, Ben C. Sheldon2 1 Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14850, USA 2 Edward Grey Institute, Department of Zoology, University of Oxford, Oxford, UK, OX1 3PS ABSTRACT In many species, individuals gather information about their environment both through direct experience and through information obtained from others. Social learning, or the acquisition of information from others, can occur both within and between species and may facilitate the rapid spread of antipredator behaviour. Within birds, acoustic signals are frequently used to alert others to the presence of predators, and individuals can quickly learn to associate novel acoustic cues with predation risk. However, few studies have addressed whether such learning occurs only though direct experience or whether it has a social component, nor whether such learning can occur between species. We investigate these questions in two sympatric species of Parids: blue tits (Cyanistes caeruleus) and great tits (Parus major). Using playbacks of unfamiliar bird vocalisations paired with a predator model in a controlled aviary setting, we 1 find that blue tits can learn to associate a novel sound with predation risk via direct experience, and that antipredator response to the sound can be socially transmitted to heterospecific observers, despite lack of first-hand experience. Our results suggest that social learning of acoustic cues can occur between species. Such interspecific social information transmission may help to mediate the formation of mixed-species aggregations. INTRODUCTION A central question in behavioural ecology is how learned traits spread through a population, and which individual characteristics may facilitate or impede their social transmission. Reflecting the increasing interest in this question is a growing body of literature which demonstrates the high adaptive value of social learning [1-4]. Unlike acquiring information directly, which requires a process of trial-and-error and often increases predation risk, acquisition of information from others can allow individuals to quickly learn about their surroundings and adjust to changing environments at a relatively lower cost [1-3]. This mechanism can enable rapid horizontal transmission of antipredator behaviour through a population, thereby directly impacting individual survival [5, 6]. Consequently, selection may act upon individuals’ capabilities for social learning and social acquisition of traits [3, 4], making this area of research important in advancing our understanding of biological evolution and adaptation. Furthermore, because the acquisition of learned behaviours is an important mechanism in the establishment of animal culture, investigating this process could give insight into the emergence and persistence of novel traditions within a population [1, 2, 7]. In order to reduce uncertainty about the surroundings, information may be acquired from both con- and heterospecifics, though the amount of overlap in the ecological niches 2 occupied by the producer and receiver must be considered. For example, information acquired from heterospecifics that use comparable foraging strategies or experience similar predation risks is more useful than information gathered from species that rely on different food sources and/or are hunted by different predators, and may therefore be more likely to transmit across species boundaries [8-10]. In recent years, a number of studies have documented social learning both within and between species [11-15]. However, to date, much of the evidence of the spread of learned traits comes from conspicuous behaviours such as tool use in primates, propagation of foraging strategies in birds, and learned birdsong, e.g., [16-18]. Furthermore, experimental manipulations of social transmission of traits are few, and the best studied examples entail gathering information from conspecifics [e.g., 17, 19]. As we aim to better understand the spread of behavioural traits, the boundaries of social transmission must be examined from multiple angles, including a range of modalities and transmission between individuals with different phenotypes. Many species of birds and mammals commonly use acoustic signals and cues to acquire information about predators [12, 13, 20-22], and a diverse array of alarm calling behaviours can be observed in different contexts, including calls directed at predators during mobbing events, distress calls made during predator attacks, calls produced whilst fleeing predators, and sentinel calls that alert nearby individuals to perceived risk levels [23]. Although information about predators is often obtained using acoustic signals produced by conspecifics, which may evolve through processes such as kin selection or reciprocal altruism [24,25], many birds and mammals commonly eavesdrop on signals intended for others [23]. Eavesdropping on heterospecifics may play an important role in the formation of mixed species assemblages [10], and response to heterospecific alarm calls can either be innate (e.g., 3 if calls are acoustically similar among species [26-28]), or learned [24, 29], which may occur as early as the embryonic stage in birds [30]. In addition to learning heterospecific alarm calls, recent experimental evidence has shown that birds and mammals can learn to associate unfamiliar acoustic cues with perceived predation risk [14, 31, 32], adding to a growing body of literature suggesting that associative learning may be the mechanism underpinning the recognition of heterospecific alarm calls. Recent research has also shown that birds can learn to associate novel sounds with heterospecific alarm calls [15], suggesting that a behavioural response to an acoustic cue can be socially transmitted, even when the cue is not initially recognised as an alarm call. The possibility that this phenomenon can occur between species has been suggested [15], but not formally tested. Here, we study birds captured from sympatric populations of blue tits (Cyanistes caeruleus) and great tits (Parus major), which spend the winter months foraging together in mixed-species flocks and use calls to alert others to the presence of predators such the Eurasian sparrowhawk (Accipter nisus) [33]. This shared suite of natural history traits suggests that interspecific social learning is likely to occur (see [11] for a review of social learning between sympatric species), yet little experimental research investigating this question has been conducted. To address this question, in this study we investigate social learning of acoustic antipredator cues in two ecologically relevant contexts: within and between species. To test our hypotheses that intra-and interspecific social transmission occurs among blue tits and great tits, we carried out a two-stage experiment. First, using playbacks paired with a predator model, we trained groups of blue tit demonstrators to associate a novel acoustic cue with predation risk. We then introduced naïve blue tit and great tit observers and conducted multiple playbacks of the acoustic cue while demonstrators and observers were 4 housed together. Importantly, the predator model was not used during this stage of the experiment, ensuring that observers had access only to social information, but not private information, that could convey predation risk. We predicted that both conspecific and heterospecific observers would acquire an antipredator response to the acoustic cue despite having no direct exposure to the predator model, and independently tested observers to determine whether intra- and interspecific social transmission had occurred. METHODS Study site and species. The subjects for this experiment were eight great tits (Parus major) and 48 blue tits (Cyanistes caeruleus) captured using mist nets from a wild population at Wytham Woods, Oxfordshire, UK (51°46 N 1°20 W) between 29th December 2015 and 8th March 2016. Blue tits were used as both demonstrators and observers and thus more individuals of this species were included. All birds were fitted with a unique radio frequency identification (RFID) tag and metal BTO leg band as well as a temporary color band that was worn for the duration of the experiment. Upon catching, we determined the age (yearling or older) and sex of all birds based on plumage characteristics [33] (Sex: great tits: 6 males, 2 females; blue tits: 27 males, 15 females, and 3 individuals where sex could not be determined); (Age: great tits: 6 yearling and 2 older; blue tits: 33 yearling and 15 older). We randomly selected birds to use in this experiment from all individuals captured during mist netting, and did not take age or sex into consideration. For each replicate of our experiment, 6 blue tits and 1 great tit were captured together and kept in captivity for seven days before being released at the site of capture. 5 We conducted all experiments in an outdoor aviary at the John Krebs Field Station, Wytham, Oxfordshire, UK, between 29th December 2015 and 8th March 2016 (Fig. 1). Two cameras, an iphone 5s and Logitech C920 HD Pro Webcam, were mounted on different walls such that the majority of the aviary space could be filmed. We placed a feeder station stocked with sunflower seeds and equipped with an RFID antenna and data logger in the center of the aviary which allowed for the time and individual identity of birds visiting the feeder to be recorded. Due to inconsistent wiring connections, RFID readers did not record some feeder visits. Therefore, for any feeder visit that was noted during video analysis but not recorded by the RFID logger, we determined identity using colored leg bands which could be seen in video footage. Figure 1. Diagram of outdoor aviary in which experiments took place. Labels refer to (a) box in which model sparrowhawk was positioned between training playbacks, (b) feeder station equipped with PIT tag reader, (c) zipline across which model sparrowhawk was flown, (d) booth with opaque walls in which the experimenter sat during playbacks, (e) cameras, (f), speaker, (g) adjacent buildings, (h) empty adjacent outdoor passageway. 6 The experiment was replicated eight times, following the protocol summarized in Table 1. In total, we tested 40 blue tit demonstrators, eight blue tit observers, and eight great tit observers. Due to camera failure, one replicate of the pre-training tests (replicate 1) and two replicates of the post-training playback tests (replicates 1 and 8) of demonstrators had to be excluded. All post-training playback tests of observers were filmed. Thus, the final sample sizes were N=35 demonstrators for pre-training tests and N=30 for post-training tests and N=8 for post-training tests of both the blue tit and great tit observers. Demonstrator groups contained (mean ± SE) 3 ±0.3 males and 1.85 ± 0.4 females, and 2.8 ±0.8 yearlings and 2.2 ±0.8 older birds. Distributions of latency to resume feeding after playbacks within males and females were not significantly different in either pre-training or post-training playbacks (pre- training: t-test: t = -1.21, df = 11.71, p = 0.25; post-training: t = -1.98, df = 20.84, p = 0.06), nor were distributions of latency to resume feeding within yearlings and older birds (pre- training: t-test: t = 0.27, df = 18.1, p = 0.79, post-training: t = -1.87, df = 31.2, p = 0.07). For this reason, and because our sample size did not allow sufficient statistical power to include these factors in our analysis, all demonstrators from the same replicate were grouped together regardless of age and sex. 7 Table 1. Protocol for single replicate. The experiment was replicated eight times, each time using five blue tit demonstrators, one blue tit observer and one great tit observer. Day Protocol o Birds captured from the wild and released into the aviary 1 o Pre-training playback tests o Move observers to the indoor aviary 2 o Demonstrator training with predator model and playback (x 4) o Demonstrator training with predator model and playback (x 4) 3 o Demonstrator playback test for associative learning o Add observers into outdoor aviary with demonstrators o Observer training with demonstrators and playback only (x 5) 4 o Place all demonstrators and one observer indoors 5 o Playback test for social learning with observer 1 6 o Playback test for social learning with observer 2 7 o Release birds at site of capture 8 Experimental design. To test our hypotheses that intra- and interspecific social learning of acoustic cues can occur, this experiment necessarily comprised two stages: (1) training demonstrators to associate a novel sound with a predation event (i.e., associative learning), and (2) exposing untrained conspecific and heterospecific observers to the trained demonstrators to test whether this behaviour is transferred horizontally (i.e., social learning) (Fig. 2). To ensure that birds learned to associate the sound with predation and did not simply exhibit a neophobic response, we used acoustically similar “control” and “treatment” sounds as stimuli: recordings of songs from a Northern Cardinal (Cardinalis cardinalis) and an Eastern Whip-poor-will (Antrostomus vociferus). These signals occupy approximately the same frequency range as tits’ vocalizations (1.5 - 6 kHz), and are from North American species, and therefore unfamiliar to all birds used in the experiment. We downloaded both recordings from Xeno-canto [34] and normalized their amplitude using Audacity 2.1.1 [35] such that both recordings were of equal amplitude and eight seconds in duration (Fig. 3). We placed Dell AX210 speakers approximately 1m from the feeder station aiming towards the center of the aviary for playbacks, and adjusted the volume such that sounds played at an amplitude of approximately 65 dB at 10 m, the amplitude at which great tits sing in the wild [36], and within the range of amplitude at which great tits produce alarm calls [37]. Playbacks were always initiated when at least one bird was foraging at the feeder station, and this rule was used in tests of demonstrator groups as well as in tests of individual observers. In all stages of the experiment, we always separated playbacks by at least one hour. For each replicate, we alternated which sounds served as treatment and control stimuli, and, in order to minimize biases for factors such as motivation to feed, which may decrease throughout the 9 day, we alternated the order in which the control and treatment sounds were used and the order in which observers were tested (Table 2). Figure 2. Graphical overview of experiment. 10 Figure 3. Spectrograms of sounds used for control and treatment playbacks plotted with Raven Pro 1.5 (www.birds.cornell.edu/raven) with 4095 point FFTs, Hann window, and 50% overlap. Sounds were downloaded from xeno-canto.org and amplitude-normalized and edited to 8 s duration. a) Northern Cardinal, b) Eastern Whip-poor-will. Table 2. Order of playback stimuli and observer testing for the 8 replicate groups. In the first four replicates, recordings of and Eastern Whip-poor-will and Norther Cardinal were used as the treatment and control sounds, respectively; in the second four replicates this was reversed. Replicate Pre-training Direct learning Social learning Observer Play- Group playback playback test playback test order back test (5 BT (BT and GT stimuli demonstrators) observers) 1st 2nd 1st 2nd 1st 2nd 1st 2nd 1 Trmt Ctrl Ctrl Trmt Ctrl Trmt GT BT Ctrl: 2 Ctrl Trmt Trmt Ctrl Trmt Ctrl GT BT NC 3 Trmt Ctrl Ctrl Trmt Ctrl Trmt BT GT Trmt: 4 Ctrl Trmt Trmt Ctrl Trmt Ctrl BT GT WPW 5 Trmt Ctrl Ctrl Trmt Ctrl Trmt GT BT Ctrl: 6 Ctrl Trmt Trmt Ctrl Trmt Ctrl GT BT WPW 7 Trmt Ctrl Ctrl Trmt Ctrl Trmt BT GT Trmt: 8 Ctrl Trmt Trmt Ctrl Trmt Ctrl BT GT NC 11 We fixed two large plastic boxes to the aviary ceiling with a cable running between them, upon which a sparrowhawk model could be flown across the 3m aviary width in under 0.5 s. Eurasian sparrowhawks (Accipter nisus) are a primary cause of mortality among tits in Wytham Woods [38], and in previous experiments, great tits and blue tits have been shown to react to such models as they would live predators [39, 40]. The model was a plastic bird that was hand painted to closely resemble a sparrowhawk and was approximately the size of an adult male (length 350 mm, wingspan 560 mm). This model was also used in previous predator exposure experiments conducted using this population [40]. The openings of both boxes had plastic curtains such that the model was not visible when inside the box. Experiment protocol. On the first day of an experiment, we caught six blue tits and one great tit before 0900 hr and placed all birds in the aviary within an hour of capture. After approximately one hour, we conducted pre-training playback tests using both the control and treatment sounds and filmed the group of seven birds for five minutes following each playback. Using this footage paired with RFID records from the feeder station, we measured the latency to resume feeding for all individuals. Latency was defined as the time from the end of the playback until first contact with the feeder. Approximately 30 minutes after playbacks were complete, we moved one blue tit and one great tit (hereafter referred to as observers) into the indoor aviary where they were housed together (see Appendix A for detailed description of indoor aviary). Five blue tits (hereafter referred to as demonstrators) remained in the outdoor aviary. The blue tit observer was selected as the first blue tit to fly into a mist net placed in the aviary. 12 Training the demonstrators. During the second and third days of an experiment, we trained the five demonstrators to associate the treatment sound with the presence of a predator by conducting eight repeat treatments (4 per day) during which a model Eurasian sparrowhawk was flown across the top of the aviary as the sound was broadcast over the speakers. All playbacks took place between 0900 and 1500 hr and were separated by at least one hour. At the end of the third day, to test whether the birds had learnt to associate the sound with the attempted predation event, we performed two additional playbacks using the treatment and control sounds, but not exposing birds to the predator model. The demonstrators were filmed for five minutes immediately following the final two playbacks; from this footage we extracted latency to resume feeding for all individuals as well as the number of alarm calls in order to compare pre- and post-training responses. Vocalizations matching descriptions of vocalizations produced by blue tits in response to predator presentations [41] were considered to be alarm calls. When analysing videos of playbacks, we counted all alarm calls and then calculated the average number of alarm calls per bird, as it was not possible to assign calls to individuals during trials. All video analyses were conducted in a blind manner. Following this test, the great tit and blue tit observers were returned to the outdoor aviary containing the demonstrators. Training the observers. In the second stage of the experiment we tested our prediction that observers could socially learn to associate a sound with danger without having the direct experience of simultaneously seeing the predator model. To facilitate social transmission, on the fourth day of an experiment we conducted five playbacks of the treatment sound while the five demonstrators and the conspecific and heterospecific observers were in the aviary 13 together over the course of one day. We did not use the predator model during these tests, ensuring that any antipredator behaviors that the observers developed in response to the treatment sound were not due to direct experience of a potential predator. At the end of the fourth day, we moved the five demonstrators and one observer indoors. Testing the observers. The next day, we conducted two playbacks with the observer that remained in the aviary (observer 1), once using the treatment sound and once using the control sound, and never exposing the observer to the predator model. Both playbacks were filmed; a blind observer used this footage to measure latency to resume feeding and number of alarm calls made in five minutes immediately following each playback. Vocalizations matching descriptions of blue tit and great tit alarm calls [41] were considered to be alarm calls produced by blue tit and great tit observers, respectively. At the end of day 5, we moved observer 1 indoors and placed observer 2 in the aviary, and performed identical playback tests the following day. Testing the observers separately ensured that they were responding only to the playback sound, rather than social cues from nearby birds. The next morning, we released all birds at the location where they were captured. Quantification and statistical analysis. In pre-training tests with demonstrators (N=7 replicates), 17 of 35 birds returned to the feeders after playbacks of the control sound (mean ± standard error: 2.83 ± 0.48 per replicate), and 13 of 35 birds returned after playbacks of the treatment sound (2.17 ± 0.4 per replicate). In post-training demonstrator tests (N=6) 20 of 30 birds returned after control playbacks (3.33 ± 0.49) and 20 of 30 returned after treatment playbacks (3.33 ± 0.42 per replicate). 14 To determine whether demonstrators learned to associate the treatment sound with a potentially dangerous event (i.e., whether associative learning had occurred), we conducted a survival analysis using a mixed effect Cox model to identity differences in latency to resume foraging between the pre- and post-training tests. We used two separate survival analyses, restricting the dataset first to pre- and then to post-training measurements, to assess whether demonstrators took significantly longer to resume foraging after the treatment versus control playbacks before and after training. We included stimulus (control or treatment sound) as a fixed binary effect, and individual bird identity and group number as random effects. By including bird identity in our model, we aimed to minimize the effects of noise in the latency measurements caused by variation between individuals in motivation to feed. In cases where an individual bird did not resume foraging within five minutes following a playback (60 of 130 demonstrator observations), latency times were censored. We also used paired t-tests to determine whether demonstrators made significantly more alarm calls within five minutes of control versus treatment playbacks after training, and used separate tests to compare demonstrators’ response before and after training. To test whether observers learned to associate the treatment sound with danger (i.e., whether social learning had occurred), we conducted separate survival analyses for blue tit observers and great tit observers with latency to resume foraging as a response variable, and used playback stimulus as a fixed effect and individual identity as a random effect. Latency values were censored when individual did not resume foraging within 5 minutes; this occurred in two of 16 trials of blue tit observers, and in two of 16 trials of great tit observers. We used paired t-tests to determine whether birds produced significantly more alarm calls following playback of the treatment sound as compared to the control sound. Analyses were performed 15 using the coxme and BSDA packages in R 3.4.1 [42-44]. See Appendix A information for further details of experimental procedures. RESULTS Associative learning of acoustic cues. Our results suggest that blue tit demonstrators learned to associate the novel cue with a predation threat. Before training, blue tit demonstrators showed no difference in latency to resume foraging after playbacks of the control or treatment sounds (mean ± SE: control: 120.8 ± 19.8 s; treatment: 104.51 ± 18.8 s; Cox mixed effects model: χ2 = 1.96, df = 1, P =0.161; Fig. 4a, c), showing that there is not an innate aversion or attraction to the sounds. After training, demonstrators took significantly longer to resume foraging after treatment playbacks compared to control playbacks (control: 90.2 ±12.53 s; treatment: 117.21 ± 14.3 s; Cox mixed effects model: χ2 = 5.81, df = 1, P =0.016; Fig. 4b, d). This suggests that the experimental training was successful in causing the demonstrators to associate the treatment sound with the presence of a predator. Both before and after training, demonstrator groups did not produce significantly more alarm calls in response to treatment vs. control playbacks (before: t = -0.42, df = 11.1, p = 0.68; after: t = 0.77, df = 8.94, p = 0.46). Social transmission of antipredator response to acoustic cues. After exposure to trained demonstrators, great tit observers exhibited different behavioural responses to control versus treatment playbacks, whereas blue tit observers exhibited no detectable difference. Great tit observers took significantly longer to resume feeding after treatment playbacks (mean ± SE: control: 48.3 ± 17 s, treatment: 72.4 ± 16.3 s, Cox mixed effects model: χ2 = 7.88, df = 1, p 16 =0.005, Fig. 5b, d) and made more alarm calls in the first five minutes after playbacks of the treatment sound, but this difference was not statistically significant (control: 12.5 ± 3.3, treatment: 21.9 ± 6.8, t = -1.31, df = 7, p = 0.23, Fig. 5f). Blue tit observers took longer to resume foraging and made more alarm calls after the playback compared to the control treatment, but neither effect was statistically significant (latency: mean ± SE: control: 37.5 ± 15.3 s, treatment: 89.3 ± 42.7 s, Cox mixed effects model: χ2 = 1.50, df = 1, p =0.221, Fig. 5a, c; alarm calls mean ± SE: control: 2.86 ± 1.01, treatment: 6.75 ± 1.93, t = -1.93, df = 7, p- value = 0.09, Fig. 5e). 17 Figure 4. Associative learning of acoustic cues within demonstrator groups. a) Survival curves showing demonstrator latency to resume foraging before training. Demonstrators did not take significantly longer to resume foraging after playbacks of treatment sound (dashed line) versus control sound (solid line, see Results), b) Survival curves showing demonstrator latency to resume foraging after training. Demonstrators took significantly longer to resume foraging after playbacks of treatment sound (dashed line) than control sound playbacks (solid line, see Results). c) Demonstrator latency to resume foraging after treatment and control sound playbacks before training. d) Demonstrator latency to resume foraging after playbacks after training. Large black dots and bars represent means and standard error. Small grey dots represent individual birds and lines indicate paired samples from same individual within a single replicate. Asterisks correspond to p < 0.05, NS corresponds to p ≥ 0.05. Censored birds (i.e., those that did not return within 300 seconds) are not shown here. 18 Figure 5. Tests for social transmission of antipredator response to heterospecific and conspecific observers after exposure to trained demonstrators. a) Survival curves showing conspecific observer latency to resume foraging after playbacks of treatment sound (dashed line) versus control sound (solid line) b) Survival curves showing heterospecific observer latency to resume foraging after social training, c) Latency of conspecific observers to resume foraging after treatment playbacks and control sound playbacks, d) latency of heterospecific observers to resume foraging after treatment playbacks and control sound playbacks, e) number of alarm calls made by conspecific observers after playbacks, f) number of alarm calls made by heterospecific observers after playbacks. Large black dots and bars represent means and standard error. Small grey dots represent individual birds and lines indicate paired samples from same individual within a single replicate. Note that individual birds that did not return within 300 s are not shown. Asterisks correspond to p < 0.05, NS corresponds to p ≥ 0.05. 19 20 DISCUSSION Evidence of interspecific social transmission of antipredator behaviour. Together, our results suggest that heterospecific observers can learn to associate a novel cue with predation threat without first-hand experience. Our results support findings from previous experimental work showing that antipredator behaviour can be acquired both through first-hand experience and secondary associations [14-15], and support the suggestion that flocking with heterospecifics gives greater access to social information that can enhance survival [45]. Social learning of predator avoidance may offer an adaptive advantage in dynamic environments; because our study population experiences strong spatial and temporal variation in food availability and predation risk, behavioural plasticity is likely under strong selection in these species [46]. Furthermore, as unfamiliar sounds were readily learnt, we suggest that both species exhibit an innate preparedness that increases likelihood of learning to associate any acoustic cue with predation. Despite finding evidence that social transmission of antipredator information occurs between blue tits and great tits, we did not detect the same significant effect amongst blue tit demonstrators and observers. Specifically, while great tit observers increased latency to resume feeding after treatment playbacks, blue tit observers exhibited a non-significant increase in both alarm calling and latency to resume feeding. One possible explanation for the lack of observed conspecific social transmission is that Parid species differ in the manner in which they respond to predators. For example, blue tits have been shown to exhibit significantly more wing-flicking when presented with predator models that move and produce calls as compared to motionless, silent models [47], and perhaps great tits respond differently to such changes in predator model behaviour. It may also be the case that learning in the absence of a predator requires more 21 repetition within blue tits; previous work in which birds were trained without direct predator exposure included 10-12 training sessions [14,15], which is five to seven more than observers received in our experimental design. We also note that our sample sizes were relatively small (seven blue tit observers and seven great tit observers). Given that the non-significant responses of blue tit observers to the playbacks are in the expected direction, we cannot rule out the possibility that the absence of a detectable change in behaviour is due to lack of statistical power (see Appendix A). Although our results support the hypothesis that blue tits can learn to associate a novel acoustic cue with predation risk through direct experience with a predator, additional experiments that perhaps have longer training periods are needed to determine whether this behaviour can be socially transmitted between individuals in this species. One issue that must be considered when interpreting our results is potential sensitization to the treatment sound due to repeated exposure during training. Because control sounds were presented only during test trials before and after training, whereas treatment sounds were presented multiple times, focal birds may have exhibited heightened responsiveness during treatment playbacks. However, if our results were caused by sensitization to the treatment sound, we would expect latency to feed after the sound was played to be significantly longer after repeated exposure. Rather, we saw a decrease in latency to return when birds were repeatedly exposed to the control sound (see Fig. 2). This suggests that, rather than birds becoming sensitized to the trained sound, they remained wary of the stimulus because it was paired with predator presentations and desensitized to the control sound, which was not associated with a threat. In future experiments, we advise that control sound playbacks that are not paired with a predator are conducted during the training period to enable testing of this alternative explanation [e.g. 14, 31]. 22 Intriguingly, blue tit demonstrators did not produce significantly more alarm calls following exposure to treatment playbacks, suggesting that rather than learning from demonstrators’ alarm calling, great tit observers learned from their behaviour. Although our findings cannot exclude the possibility that Parids also acquire anti-predator responses via acoustic association, our results present a different mechanism by which they may learn about predation risk. This therefore builds on recent work that has demonstrated that social learning can occur through acoustic association [15], and also suggests that there may be numerous ways in which individuals can acquire information about predators. Level of perceived risk may encourage social learning. Interestingly, naïve observers adopted demonstrators’ behaviour despite a lack of reinforcement during training, as the predator model was not presented after the initial demonstrator training. One possible explanation for this is that when costs of ignoring a cue are high, even unreliable social information is favoured over personal information [48]. Thus, as perceived risk increases, individuals are expected to copy rather than learn independently [49]. This tendency can enable extreme examples of cultural transmission of antipredator response to benign heterospecifics [50, 51], and can be used to train captive-bred animals before release [52]. Ultimately, learning strategy is likely determined by several factors, including the relative reliability of social and personal information, perceived cost of direct learning, degree of environmental variability, number of demonstrators, as well as observer and demonstrator identity. 23 Ecological and evolutionary implications. Two possible explanations for our results are that (1) the treatment sound is perceived as a vocalization produced by a novel predator, or (2) the treatment sound is perceived as an alarm call from a novel species. Neither can be ruled out within this experimental design; however, because sparrowhawks hunt primarily by surprise, as simulated in demonstrator training, the second alternative may be more likely. In either case, our results add to evidence that animals with complex vocal behaviours have evolved to efficiently process and use acoustic information, and that sympatric species may experience selection pressure to acquire acoustic information from both con- and heterospecifics. The ability to rapidly recognize and adjust behaviour in response to acoustic cues is expected to be adaptive for species that have evolved to efficiently encode and process sounds, such as most vertebrates [53], particularly passerine birds, which execute complex vocal communication tasks and acoustic environmental awareness [54]. These findings also suggest that within mixed-species communities, individuals may be predisposed to sharing and efficiently using social information from sympatric individuals, regardless of species. Our findings also add to research showing that social information transmission can facilitate recognition of novel predators [52], and suggest that social information acquired from heterospecifics may enable adaptation to dynamic environments [55]. One constraint of this experiment was that a single exemplar of each sound was used; we were therefore unable to test whether receivers were able to recognize a general class of non-identical acoustic signals. In order to determine whether our findings extend more broadly to alarm calling in wild animals, further experiments in which the acoustic parameters and presentation of the signal are varied are required. Finally, we suggest that future experiments also videotape playbacks in a manner that allows for measuring individual hiding 24 and freezing behaviour. Although this was not feasible given the layout of the aviary in which we conducted this experiment, it may be an important behaviour used by Parids in response to model predator presentations. Taken together, our results suggest that social transmission of predator avoidance behaviour occurs between species, and that using social information rather than private information may be favoured in the context of predator avoidance. Ultimately, our findings may help also to explain how species-level attributes and interspecific social learning could mediate the formation of mixed-species communities and the establishment of new traditions and cultures. ACKNOWLEDGMENTS We thank L.M. Aplin for her insightful suggestions during the development of this project; K. McMahon, F. Bell, D. Wilson, and N. Carlson for their valuable assistance during field work; H. Klinck and the Bioacoustic Research Program for technical advice and support; M.A. Pardo and E.L. Mudrak for assistance in statistical analysis; and H.K. Reeve, Russel Ligon, the Cornell Animal Behavior Lunch Bunch, and the Sheldon Lab group for their valuable feedback. All artwork in figures was created by Megan Bishop. This research was made possible by support to S.C.K. from the Cornell Lab of Ornithology and Department of Neurobiology and Behavior. 25 WORKS CITED 1. Danchin, É., Giraldeau, L. A., Valone, T. J., and Wagner, R. H. (2004). Public information: from nosy neighbors to cultural evolution. Science 305, 487-491. 2. Boyd, Robert, and Peter J. Richerson. (1985). Culture and the evolutionary process (Chicago: University of Chicago Press). 3. Laland, K. N. (2004). Social learning strategies. Learn. Behav. 32, 4-14. 4. Hoppitt, W., and Laland, K. N. (2013). Social learning: an introduction to mechanisms, methods, and models (Princeton: Princeton University Press). 5. Griffin, A. S. (2004). Social learning about predators: a review and prospectus. Learn. Behav. 32, 131-140. 6. Carthey, A. J., and Blumstein, D. T. (2017). Predicting Predator Recognition in a Changing World. Trends Ecol. Evol. 7. Whiten, A. (2000). Primate culture and social learning. Cog. Sci. 24, 477-508. 8. Slagsvold, T., and Wiebe, K. L. (2011). Social learning in birds and its role in shaping a foraging niche. Proc. Biol. Sci. 366, 969. 9. Seppänen, J. T., Forsman, J. T., Mönkkönen, M., and Thomson, R. L. (2007). Social information use is a process across time, space, and ecology, reaching heterospecifics. Ecology 88, 1622-1633. 10. Goodale, E., Beauchamp, G., Magrath, R. D., Nieh, J. C., and Ruxton, G. D. (2010). Interspecific information transfer influences animal community structure. Trends Ecol. Evol. 25, 354-361. 11. Avarguès-Weber, A., Dawson, E. H., and Chittka, L. (2013). Mechanisms of social learning across species boundaries. J. Zool. 290, 1-11. 12. Templeton, C. N., and Greene, E. (2007). Nuthatches eavesdrop on variations in heterospecific chickadee mobbing alarm calls. P. Natl. A. Sci. USA. 104, 5479-5482. 13. Magrath, Robert D., B. J. Pitcher, and J. L. Gardner. (2007). A mutual understanding? Interspecific responses by birds to each other's aerial alarm calls. Behav. Ecol. 18, 944- 951. 14. Magrath, R.D., Haff, T.M., McLachlan, J.R., and Igic, B. (2015). Wild birds learn to eavesdrop on heterospecific alarm calls. Curr. Biol. 25, 2047–2050. 15. Potvin, D. A., Ratnayake, C. P., Radford, A. N., and Magrath, R. D. (2018). Birds Learn Socially to Recognize Heterospecific Alarm Calls by Acoustic Association. Curr. Biol. 28, 2632-2637. 26 16. Nagell, K., Olguin, R. S., and Tomasello, M. (1993). Processes of social learning in the tool use of chimpanzees (Pan troglodytes) and human children (Homo sapiens). J. Comp. Psychol. 107, 174. 17. Aplin, L. M., Farine, D. R., Morand-Ferron, J., Cockburn, A., Thornton, A., and Sheldon, B. C. (2015). Experimentally induced innovations lead to persistent culture via conformity in wild birds. Nature 518, 538-541. 18. Catchpole, C. K., and Slater, P. J. (2003). Bird song: biological themes and variations (Cambridge: CUP). 19. Page, R. A., and Ryan, M. J. (2006). Social transmission of novel foraging behavior in bats: frog calls and their referents. Curr. Biol. 16, 1201-1205. 20. Seyfarth, R. M., Cheney, D. L., and Marler, P. (1980). Monkey responses to three different alarm calls: evidence of predator classification and semantic communication. Science 210, 801-803. 21. Blumstein, D. T., and Armitage, K. B. (1997). Alarm calling in yellow-bellied marmots: I. The meaning of situationally variable alarm calls. Anim. Behav. 53, 143-171. 22. Manser, M. B. (2001). The acoustic structure of suricates' alarm calls varies with predator type and the level of response urgency. Proc. Biol. Sci. 268, 2315-2324. 23. Magrath, R. D., Haff, T. M., Fallow, P. M., and Radford, A. N. (2015). Eavesdropping on heterospecific alarm calls: from mechanisms to consequences. Biol. Rev., 90, 560-586. 24. Smith, J. M. (1965). The evolution of alarm calls. Am. Nat. 99, 59-63. 25. Trivers, R. L. (1971). The evolution of reciprocal altruism. Q. Rev. Biol. 46, 35-57 26. Fallow, P. M., Gardner, J. L., and Magrath, R. D. (2011). Sound familiar? Acoustic similarity provokes responses to unfamiliar heterospecific alarm calls. Behav. Ecol. 22, 401-410. 27. Huang, X., Metzner, W., Zhang, K., Wang, Y., Luo, B., Sun, C., Tinglei, J., and Feng, J. (2018). Acoustic similarity elicits responses to heterospecific distress calls in bats (Mammalia: Chiroptera). Anim. Behav. 146, 143-154. 28. Magrath, R. D., Pitcher, B. J. and Gardner, J. L. (2009b). Recognition of other species’ aerial alarm calls: speaking the same language or learning another? Proc. Biol. Sci. 276, 769–774. 29. Hollen, L. I., and Radford, A. N. (2009). The development of alarm call behaviour in mammals and birds. Animal Behaviour, 78(4), 791-800. 27 30. Colombelli-Negrel, D., Hauber, M. E., Robertson, J., Sulloway, F. J., Hoi, H., Griggio, M. and Kleindorfer, S. (2012). Embryonic learning of vocal passwords in superb fairy-wrens reveals intruder cuckoo nestlings. Curr. Biol. 22, 2155–2160. 31. Dutour, M., Léna, J. P., Dumet, A., Gardette, V., Mondy, N., and Lengagne, T. (2019). The role of associative learning process on the response of fledgling great tits (Parus major) to mobbing calls. Anim. Cogn. 22, 1095-1103. 32. Wheeler, B. C., Fahy, M., and Tiddi, B. (2019). Experimental evidence for heterospecific alarm signal recognition via associative learning in wild capuchin monkeys. Anim. Cogn. 1-9. 33. Svensson, L. (1992). Identification guide to European passerines (BTO: Thetford, UK). 34. Xeno-canto. https://www.xeno-canto.org. 35. Audacity 2.1.1. The Audacity Team (2015). http://audacityteam.org. 36. Peake, T. M., Terry, A. M. R., McGregor, P. K., and Dabelsteen, T. (2002). Do great tits assess rivals by combining direct experience with information gathered by eavesdropping? Proc. Biol. Sci. 269, 1925-1929. 37. Templeton, C. N., Zollinger, S. A., and Brumm, H. (2016). Traffic noise drowns out great tit alarm calls. Curr. Biol. 26, 1173-1174. 38. Vedder, O., Bouwhuis, S., and Sheldon, B. C. (2014). The contribution of an avian top predator to selection in prey species. J. Anim. Ecol. 83, 99-106. 39. Gentle, L. K., and Gosler, A. G. (2001). Fat reserves and perceived predation risk in the great tit, Parus major. Proc. Biol. Sci. 268, 487-491. 40. Voelkl, B., Firth, J. A., and Sheldon, B. C. (2016). Nonlethal Predator effects on the turn- over of wild bird flocks. Sci. Rep. 6, 33476. 41. Carlson, N. V., Healy, S. D., and Templeton, C. N. (2017). A comparative study of how British tits encode predator threat in their mobbing calls. Anim. Behav. 125, 77-92. 42. Therneau, T.M. (2018). coxme: Mixed Effects Cox Models. R package., 2.2-10 Edition. 43. Arnholt, A.M. and Evans, B. (2017). BSDA: Basic Statistics and Data Analysis, R package., 1.2-0 Edition. 44. R Core Team. (2018). R: A language and environment for statistical computing. (R Foundation for Statistical Computing). 45. Krebs, J. R. (1973). Social learning and the significance of mixed-species flocks of chickadees (Parus spp.). Can. J. Zool. 51, 1275-1288. 28 46. Lima, S. L., and Dill, L. M. (1990). Behavioral decisions made under the risk of predation: a review and prospectus. Can. J. Zool. 68, 619-640. 47. Carlson, N. V., Pargeter, H. M., and Templeton, C. N. (2017). Sparrowhawk movement, calling, and presence of dead conspecifics differentially impact blue tit (Cyanistes caeruleus) vocal and behavioral mobbing responses. Behav. Ecol. Sociobiol. 71, 133. 48. Galef, B. G., and Laland, K. N. (2005). Social learning in animals: empirical studies and theoretical models. AIBS Bull. 55, 489-499. 49. Webster, M. M., and Laland, K. N. (2008). Social learning strategies and predation risk: minnows copy only when using private information would be costly. Proc. Biol. Sci. 275, 2869-2876. 50. Curio, E., Ernst, U., and Vieth, W. (1978). Cultural transmission of enemy recognition: one function of mobbing. Science 202, 899-901. 51. Vieth, W., Curio, E., and Ernst, U. (1980). The adaptive significance of avian mobbing. III. Cultural transmission of enemy recognition in blackbirds: cross-species tutoring and properties of learning. Anim. Behav. 28, 1217-1229. 52. Griffin, A. S., Blumstein, D. T., and Evans, C. S. (2000). Training captive-bred or translocated animals to avoid predators. Conserv. Biol. 14, 1317-1326. 53. Popper, N., and Fay, R. (1997). Evolution of the ear and hearing: issues and questions. Brain Behav. Evol. 50, 213-221. 54. Manley, G. A., and Gleich, O. (1992). Evolution and specialization of function in the avian auditory periphery. In The Evolutionary Biology of Hearing. (Springer: New York), pp. 561-580. 55. Farine, D. R., Aplin, L. M., Sheldon, B. C., and Hoppitt, W. (2015). Interspecific social networks promote information transmission in wild songbirds. Proc. Biol. Sci. 282,1803. 29 CHAPTER 2 A MACHINE LEARNING APPROACH FOR CLASSIFYING AND QUANTIFYING ACOUSTIC DIVERSITY 1,2,3Sara Keen, 3Karan Odom, 4Marcelo Araya-Salas, 2,3Mike Webster, 5Timothy F. Wright 1Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, 14850, USA. 2Department of Neurobiology and Behavior, Cornell University, Ithaca, NY, 14850, USA. 3Cornell Lab of Ornithology, Cornell University, Ithaca, NY, 14850, USA. 4. Sede del Sur, Universidad de Costa Rica, Golfito, 60701, Costa Rica 5Department of Biology, New Mexico State University, Las Cruces, NM 88003, USA. ABSTRACT 1. Assessing diversity of discretely varying behavior is a classical ethological problem. In particular, the challenge of calculating an individuals’ or species’ repertoire size is often an important step in ecological and behavioral studies, but a reproducible and broadly applicable method for accomplishing this task is not currently available to researchers. 2. We offer a generalizable method to automate the calculation and quantification of acoustic diversity using an unsupervised random forest framework. We tested our method using natural and synthetic data sets of known repertoire sizes that exhibit 30 variation in common acoustic features and in recording quality, which allowed us to evaluate performance using signals with standardized variation. We tested two approaches to estimate acoustic diversity using the output from unsupervised random forest analyses: (i) cluster analysis to estimate the number of discrete acoustic signals (e.g., repertoire size) and (ii) an estimation of acoustic area in acoustic feature space, as a proxy for repertoire size. 3. Generally, we find that our unsupervised analyses classify acoustic structure with high accuracy. We also find that both approaches to estimate acoustic diversity offer robust means of estimating the number of discrete elements in scenarios when repertoire size is small to intermediate (5-20 unique elements). However, for larger data sets (20-100 unique elements), we find that calculating the size of the area occupied in acoustic space is a more reliable proxy for estimating repertoire size. 4. We conclude that our implementation of unsupervised random forest analysis offers a generalizable tool that researchers can apply to classify acoustic structure of diverse data sets. Additionally, using output from these analyses can be used to compare the distribution and diversity of signals in acoustic space, creating opportunities to quantify and compare the amount of acoustic variation among individuals, populations, or species in a standardized way. INTRODUCTION Many animals use vocal signals to transmit information and mediate a wide range of social behaviors, from resource competition to attracting mates (Payne et al. 1986, Kroodsma and 31 Miller 1996, Gerhardt and Huber 2002, Catchpole and Slater 2003, Janik 2009). Owing to the ubiquity and ecological importance of acoustic signaling, quantifying and comparing animal vocalizations is a major part of animal behavior and communication systems research. Data from several studies suggest that signals often fall into distinct categories based on their acoustic structure (e.g. birds, Kroodsma 1982; cetaceans, Janik 2009; primates, Owren et al. 1992). Such categories are often observed at the species level when conspecifics use a shared repertoire of distinct acoustic signals that are associated with different contexts (Marler 1982, Seyfarth and Cheney 2003). Distinct categories can also arise within a signal type, as when an individual uses several signal variants that have the same functional role (e.g., the song repertoires of many songbirds comprise multiple song types, Catchpole and Slater 2003). Classifying or quantifying variation in animal signals is fundamental to many questions in animal communication. For example, metrics derived from measuring the number of unique elements or vocalizations produced by an individual, such as repertoire size and acoustic diversity, have been shown to correlate with indicators of quality, including territory size, cognitive ability, brain morphology, and levels of stress during early stages of development (Sewall et al. 2013, Devoogd et al. 1993, Podos et al. 2009). At the population level, differences in acoustic signals can facilitate species recognition (e.g., amphibians, Ryan 1985) and can play an important role in speciation by promoting isolation between sympatric groups (e.g., crickets, Mullen et al. 2007; birds, Mason et al. 2017). When assessing entire ecosystems, acoustic diversity, or the amount of variation within and among populations’ or communities’ vocal repertoires, serves as a commonly used metric for assessing ecosystem health or demographic aspects of communities (Seuer et al. 2008, Laiolo et al. 2008, Pijanowksi et al. 2011). For these reasons, quantifying acoustic diversity is often an important 32 step in addressing questions and testing hypotheses regarding the social and ecological factors influencing signal function and evolution. Classifying signals is often difficult or time consuming because acoustic variation across environments, individuals, or even different renditions of a signal by the same individual can be considerable. Furthermore, not all variation in acoustic structure is discrete, and therefore can be difficult to classify (Wadewitz et al. 2015). Within behavioral ecology, a common approach for quantifying variation among signals is to estimate repertoire size or element diversity. In this study, we consider diversity as the number of discrete vocalization types or elements used by an individual or species (this differs from ecological definition of diversity, which describes both the number and evenness of entities in the environment). While it is theoretically possible to count every discrete acoustic element in a data set of vocal elements, for animals with large repertoire sizes it is common to subsample a species repertoire and use either accumulation curves or a capture-recapture analysis to estimate repertoire size (Wildenthal 1965, Garamzegi et al. 2002, Catchpole and Slater 2003, Garamzegi et al. 2005). However, this approach requires first manually classifying elements or vocalizations, a process that can be subjective and vary among observers, and may become unwieldy or even nearly impossible for species with large repertoires or multispecies studies. Applying these approaches can also be complicated by the tendency of subsampling to result in biased measurements in some data sets (Botero et al. 2008). In recent years, several techniques have been developed which improve upon these methods (e.g., Peshek and Blumstein 2011; Kershenbaum et al. 2015), including approaches that use an information theory-based approach to quantify individuality of vocal signals (Beecher 1989, Freeberg and Kucas 2012, Linhart et al. 2019). Additionally, methods have been developed to help 33 distinguish among more graded element types (e.g., Wadewitz et al. 2015). Nevertheless, the general challenge of quantifying repertoire size still exists with many of these methods: human-based classification is both time intensive and unavoidably subjective. In passive acoustic monitoring and quantification of soundscapes, there is an emphasis on creating fully automated approaches for classification and measurement of acoustic signals. One such approach, acoustic indices, has been used to quantify ecosystem–level to individual behavioral variation (Seuer et al. 2014). Studies suggest that ecosystem acoustic diversity indices may be indicators of biodiversity (Sueur et al. 2008a, Harris et al. 2016), degree of functional and/or phylogenetic diversity within a community (Gasc et al. 2013), and a proxy for local vocal activity (Pieretti et al. 2011). These metrics have become increasingly important to ecological assessment and monitoring (Gibb et al. 2019), however, they are often calculated at scales that are more appropriate to ecosystem or community ecology. Unlike soundscape analysis, measuring acoustic diversity on the species- or individual-level requires quantifying differences between discrete elements. Machine learning offers an automated and objective approach for such classification tasks, and is a powerful tool for detecting and distinguishing among vocal signals from different species (e.g., Acevedo et al. 2009, Briggs et al. 2013, Hershey et al. 2017, Stowell et al. 2019). In particular, unsupervised machine learning approaches offer several advantages that enhance their value for assessing behavioral diversity, namely in that they do not require a labeled training data set or a priori assumptions about the structure of data (Valletta et al. 2017). Unsupervised techniques can also determine which acoustic parameters contribute most to classification or splitting data into classes, therefore relieving researchers from the need to make potentially subjective choices about feature selection (Breiman 2001). Unsupervised 34 analyses have shown high performance in the classification of vocal signals to species as compared to other approaches (Keen et al. 2014), including in the case of large data sets (Stowell and Plumbley 2014), and there appears to be much promise in applying these techniques to evaluate acoustic diversity (Ulloa et al. 2018). However, a widely applicable tool for assessing acoustic diversity at the levels of individual, species, or communities is not readily available. In this paper, we present and evaluate the use of unsupervised machine learning for classifying and quantifying acoustic diversity in animal signals. Specifically, we examine two approaches for estimating repertoire size: (1) a clustering method to identify discrete numbers of acoustic units and (2) an acoustic area calculation as a proxy for repertoire size. We evaluate the accuracy of these approaches on multiple data sets with known varying acoustic structure. Three unique aspects of our approach help ensure this method will be highly generalizable to diverse acoustic signals. First, we test algorithm performance using both field-recorded and synthesized acoustic data sets with known sample sizes and variation, making it possible to evaluate the usefulness of our method under a variety of conditions. Second, we incorporate several of the most commonly used acoustic parameters for characterizing signal structure. Third, we used test data sets with realistic distributions of variation and background noise, making it possible to evaluate the robustness of this approach to variable acoustic structures and across a range of recording scenarios. Together, these steps allow us to rigorously evaluate performance and provide recommendations about application in different scenarios. Based on our results, we believe this technique offers a powerful tool for researchers to quantify a diversity across taxa and communities. 35 METHODS We estimated acoustic diversity for a collection of natural and synthetic acoustic signals using a machine learning approach (random forest) and evaluated the performance of this method following the workflow in Figure 1. This process involved creating sets of synthetic acoustic signals with known repertoire sizes and known amounts of structural variation, extracting acoustic features from these signals, running unsupervised random forest analyses to calculate pairwise distances between signals, and estimating repertoire size using both cluster analysis the size of the acoustic feature space (hereafter referred to as acoustic space). In addition, we evaluated how variation in repertoire size and acoustic structure affects the accuracy of supervised random forest. Figure 1. Flowchart of study design. Using a random forest approach was integral to our workflow for several reasons. A key advantage of random forest is its ability to determine which feature measurements best 36 divide data into distinct categories; therefore, it is possible to use a large number of features and allow the algorithm to determine which are most useful for a given data set. Random forest also offers several additional advantages over other machine learning techniques: it is robust to collinearity, outliers and unbalanced data sets, is efficient even with large and highly multi-dimensional data sets, can be used in both a supervised and unsupervised manner, can handle non-monotonic relationships, ignores non-informative variables, produces low bias estimates, computes proximity of observations which can be used for representing trait spaces, and can be used to identify variables that contribute most to finding structure within a data set (Valletta et al. 2017). For these reasons, combining random forest with a large suite of automated acoustic feature measurements holds much promise as a generalizable tool for acoustic classification tasks. Test data sets. We evaluated the performance of our proposed method using four data sets: annotated field recordings of long-billed hermits (Phaethornis longirostris), annotated lab recordings of budgerigars (Melopsittacus undulatus), and two collections of synthetic data sets that were modeled on natural vocalizations of these two species (see Table 1 for a summary of data sets and Figure 2 for sample spectrograms). This enabled us to assess performance using vocal signals collected from live birds that reflect the naturally occurring variation between individuals as well as with signals that have a priori defined discrete variation. The use of synthetic data sets as test cases also allowed us to conduct repeated tests of algorithm performance under different conditions. Field recordings of long-billed hermits were collected from known individuals in wild populations at La Selva Biological Station, Costa Rica (10°, 25' N; 84°, 00' W), between 2008 37 and 2017. Males in this species live in territorial leks that exhibit local songs that are shared by sub groups of individuals (i.e. singing neighborhoods) within a lek (Araya-Salas and Wright 2013). For this study, we used songs recorded from 16 leks (mean ± SE songs per group = 3.1 ± 0.51). Because the song types used by long-billed hermits change over time, it was possible to use songs recorded from the same lek in different years to compile a sample of 50 unique song types. We verified that song types exhibited distinct spectro-temporal structures using spectrograms created in the R package warbleR (Araya-Salas and Smith- Vidaurre 2017) (see Figure B1 for spectrograms). To create the test data set for this study, we identified the 50 song types had the most samples, and selected the 10 recordings with the highest signal-to-noise ratio for each type, yielding a data set of 500 signals. Laboratory recordings of budgerigar contact calls were collected between July and November 2010 from a laboratory population originally acquired from a captive breeder. Individual budgerigars typically have repertoires of 2-5 acoustically distinct contact call types that are shared with some other individuals within their flock. Contact calls were recorded from 38 different individuals that were temporarily isolated from their flock mates in a homemade acoustic chamber constructed of an Igloo cooler lined with acoustic foam with a clear plexiglass door as described in Dahlin et al. (2014). In order to promote calling during recording sessions, we played recordings of unfamiliar budgerigar vocalizations at low amplitudes and also ensured that isolated individuals were in visual contact with the flock mates. Calls were recorded during 30 min sessions that occurred twice per week using a Audio-Technica Pro 37 microphone input to a Dell DHMPC running Syrinx 2.6 (Burt 2006) with a 22.05 kHz sampling rate. Calls were automatically partitioned and saved to separate wav files by Syrinx. Trained research assistants visually assessed spectrograms made from 38 wav files and assigned calls to classes using Raven 1.3 (Cornell Lab of Ornithology). Call classification was subsequently verified using a discriminant function analysis as described in Dahlin et al. (2014). To select the calls used in this study, we randomly selected 35 contact calls from each of 15 unique call types, resulting in a data set of 525 signals. Synthetic data set creation. To create the synthesized song data sets used for testing, we first extracted the dominant frequency contours of the natural bird vocalizations (long-billed hermit songs and budgerigar calls). We then modeled these time series of frequency values using autoregressive moving average (ARMA) models. Briefly, these models find the maximum likelihood estimates of the parameters in a polynomial equation predicting the variation in time series. These parameters can be later used to simulate new time series, or, in our case, new dominant frequency contours for generating synthetic vocalizations. ARMA model parameters were estimated for each natural data set independently and later used to simulate frequency contours resembling those original data sets. These contours were converted into an audio clip using the R soundgen package (Anikin 2019). We allowed the synthetic sounds to vary in three features: duration (short: 150 ms; long: 300 ms), harmonic content (low and high) and background noise (low: 20 dB signal-to-noise ratio; high: 2 dB signal-to-noise ratio). Duration values were based on the observed variation in long-billed hermit and budgerigar data sets (mean ± SE duration: long-billed hermit songs: 143.32 ± 17.5 ms, budgerigar calls: 138.23 ± 20.4 ms; histograms shown in Figure B2). The natural vocalizations used as template have very little harmonic content. Hence, harmonic content was simulated arbitrarily as frequency contours an octave (twice the frequency) and a fifth (2.5 times) above the dominant frequency contour. Variation in background noise was 39 generated by adding normally distributed noise (i.e. white noise) to each signal. In order to adequately test the ability of our method to estimate repertoire size and to determine whether this can be approximated by calculating the area occupied in acoustic space, we used this process to synthesize data sets with repertoire sizes of 5, 10, 15, 20, 50, or 100 unique elements. Each element type was represented by 10 examples. Variation within element types (i.e. between examples) was generated by adding randomly generated values to the simulated frequency contours drawn from a normal distribution with mean of 0 and a standard deviation equal to a tenth of the standard deviation in frequency for each contour. For each possible repertoire size, we used all possible combinations of duration, harmonic content, and background noise, resulting in 48 synthetic data sets for both long-billed hermit songs and budgerigar calls (see Table 1). See the Appendix B for further details of data synthesis. Sample spectrograms of signals from each data set are shown in Figure 2. 40 Table 1. Summary of test data sets used to evaluate performance. Data Recording Number of Unique elements Examples of description type data sets in repertoire each element Long billed hermit Field 1 50 10 songs Budgerigar calls Laboratory 1 15 35 8 x 5 8 x 10 Synthetic long- 48 8 x 15 Synthetic 10 billed hermit songs 8 x 20 8 x 50 8 x 100 8 x 5 8 x 10 Synthetic 8 x 15 Synthetic 48 10 budgerigar calls 8 x 20 8 x 50 8 x 100 41 Figure 2. Spectrograms with examples from each data set. Example spectrograms from acoustic signals used to test algorithm performance from data sets including a) field recordings of long billed hermit songs, b) lab recordings of budgerigar songs, c) synthetic long billed hermit songs, d) synthetic budgerigar songs. Acoustic feature measurements. We collected a suite of acoustic measurements from each audio clip in every test data set. We first applied a 500 Hz high pass filter to all audio clips to remove low frequency noise, and then created spectrograms for each sample clip using 300- point FFT with a Hann window and 90% overlap. We extracted several common acoustic feature measurements from each signal. These included 181 descriptive statistics of Mel Frequency cepstral coefficients (MFCCs; Lyon and Ordubadi 1982, sensu Salamon et al. 2014) and 28 acoustic parameters using the R packages warbler and seewave (Araya-Salas 42 and Smith-Vidaurre 2017, Sueur et al. 2008), which included commonly used acoustic measurements such as peak frequency, bandwidth, duration, as well as robust measurements based on energy distributions. We also calculated two pairwise distance matrices for every data set: one using spectrogram cross correlation (Clark et al. 1987) and one using dynamic time warping (Wolberg 1990). We then used classic multi-dimensional scaling (MDS) to translate the SPCC and DTW distance matrices into five-dimensional space, and used the axis coordinates for each sample as additional feature measurements (i.e., five SPCC MDS coordinates and five DTW MDS coordinates per sample). Together, this resulted in a vector of 219 feature measurements for each signal. The feature vectors for each audio clip were collated into a single matrix for each data set. We removed any collinear measurements from the matrix, used a Box-Cox transformation to improve normality, and scaled and centered all feature values. The resulting matrix was used as the input into the supervised and unsupervised random forest models. Supervised random forest analyses. To evaluate the ability of random forest to classify signals into the correct categories, we used a supervised random forest created with the randomForest R package (Liaw and Weiner 2002), to classify the labeled signals in each data set into separate categories. Here, “supervised” denotes that the random forest model was created using a labeled data set. When using a supervised random forest approach, individual decision trees are constructed by splitting data into two classes at each node using a randomly selected feature measurement, with the goal of optimizing the split between labeled classes. Because all data sets were labeled by either human experts (field and lab recordings) or by software (synthetic data), we could then assess how well the supervised random forest models 43 were able to classify signals from the same category together using the out-of-bag error estimate (Breiman 2001). When using a supervised random forest, out-of-bag error is calculated by iteratively removing a single sample and building a random forest model with the remaining data, and then testing whether that sample is classified to the same category as other samples from the same class. These supervised random forest analyses served as a proof of concept, as it confirmed that models constructed from the selected acoustic features could accurately be assigned to the expected categories. Unsupervised random forest analyses. To determine whether our method can be used to estimate repertoire size or acoustic diversity for unlabeled data, we created an unsupervised random forest model for each data set using the randomForest R package (Liaw and Weiner 2002). Unlike the supervised random forest approach, an unsupervised random forest uses unlabeled samples to create a collection of decision trees by optimally splitting the distribution of values for a randomly selected feature measurement at each node. Unsupervised random forests are often used with the goal of finding underlying structure within data (Breiman 2001). This is possible with unlabeled data because decision trees assign all samples to end nodes, i.e. different classes, and one can then calculate the pairwise distance between samples within a data set as the proportion of times a pair of samples is classified in the same end node. For this study, each unsupervised random forest model was constructed using 10,000 decision trees that were built using the unlabeled feature measurements from each data set. We then used the output of each unsupervised model to obtain pairwise distances between all samples within each data set. 44 Performance evaluation. We used several metrics to evaluate how well our method could assign unlabeled signals into different classes. First, we assessed performance of each supervised random forest model by calculating out-of-bag error rates, which provided a misclassification rate for each data set. Using these values, we examined whether duration of audio clips (long vs. short), harmonic content (high vs. low), level of background noise (high vs. low), or number of discrete elements influenced the ability of models to assign signals to the correct class. We evaluated how well the unsupervised analysis could measure acoustic diversity using two approaches: by estimating number of unique elements (i.e., repertoire size) in each data set and by calculating the area of the acoustic space occupied by all signals in a data set. To estimate repertoire size, we applied partitioning around medoids to the pairwise distance matrix returned by the unsupervised random forests for each data set (Kaufman and Rousseeuw 2009). For each data set, we calculated silhouette width to determine the optimal number of clusters, and then calculated the difference between this value (the estimated repertoire size) and the true repertoire size. For each data set, we also calculated the classification accuracy by assigning each cluster a label corresponding to the signal type that was most frequently placed in that cluster, and then dividing the total number of correctly assigned samples by the number of samples in the data set. We also calculated the adjusted Rand index for each data set, which is a metric of how often samples of the same type are assigned to the same cluster, and different types assigned to different clusters (Rand 1971). To create the acoustic space, we first applied non-metric multidimensional scaling to the pairwise distance matrix produced by the unsupervised random forest created for each data set. We then calculated acoustic area as the 95% minimum convex polygon (i.e. 45 excluding the proportion of outliers above 95%) of these points. We then used Spearman’s rank correlation to test whether acoustic area increased with true repertoire size. We ran these analyses on the four collections of data sets described above. Lastly, in order to visualize how well the unsupervised analyses clustered distinct signal types in our data sets, we used the t-distributed stochastic neighbor embedding (t-SNE) dimensionality reduction technique to display all samples in two dimensions (Maatan and Hinton 2008). All statistical analyses were conducted using the R packages cluster, tsne, MASS, and adehabitatHR (Maechler et al. 2019, Donaldson 2016, Venables and Ripley 2002, Calenge 2006). RESULTS Supervised random forest performance. Out-of-bag error was below what would be expected by chance for all supervised random forest models: field recordings of long-billed hermits: 0.04, lab recordings of budgerigars: 0.053; synthetic long-billed hermit data sets (mean ± SE): 0.02 ± 0.043; synthetic budgerigar data sets: 0.049 ± 0.017 (see Appendix B for further details). However, we observed that certain signal characteristics in our synthetic calls sets influenced error rates. Namely, synthetic long billed hermit songs that have low harmonic content or high background noise have higher out-of-bag error rates, and typically error rates were higher in long billed hermits than in budgerigars. Synthetic data sets that had higher numbers of discrete element types also had higher out-out bag error rates (Figure 3). Variable importance rankings indicating which feature measurements were most useful in splitting data into distinct classes were different for each of the four data set types used for testing (Table B1). 46 Figure 3. Out-of-bag error rates for supervised random forest models created for synthetic data sets with varying a) duration, b) harmonic content, c) levels of background noise. Black violin plots show results for synthetic budgerigar and gray plots results for synthetic long billed hermit data sets. Unsupervised random forest performance and calculating acoustic diversity. Using cluster analysis to evaluate repertoire size, we observed that our estimates of repertoire size were most accurate for synthetic data sets that contained 20 or fewer unique elements (Figure 4a). Classification accuracy was often above 90% for data sets with five unique elements, and decreased as the true number of discrete elements in a data set increased, reaching around 60% for data sets with 100 unique elements (Figure 4b). Similarly, adjusted Rand indices were relatively high for synthetic data sets with small numbers of unique elements, and 47 decreased among data sets as the number of unique elements increased (Figure 4c). An exception to this pattern was the synthetic budgerigar data sets with five unique elements, which had lower adjusted Rand indices because data were often clustered into less than five classes. The scatter plots in Figure 5a-d illustrate the ability of the unsupervised analysis to cluster synthetic signals of the same class together. 48 Figure 4. Unsupervised performance varies with number of unique elements in synthetic data sets. Plots of results from cluster analysis of unsupervised random forest output showing a) estimated repertoire size, b) classification accuracy, c) adjusted Rand index versus true repertoire size. White and black boxes represent results from synthetic budgerigar calls and synthetic long billed hermit songs, respectively. 49 The unsupervised analysis of live budgerigar calls using cluster analysis correctly estimated that there were 15 unique signal types in the data set. However, all calls from the same truth class were not always assigned to the same cluster (Figure 4c), which is reflected by the classification accuracy of 79.0 % and adjusted rand index of 0.602. The unsupervised analysis of field-recorded long-billed hermit songs incorrectly estimated 75 unique signal types in the data set, which was the maximum allowed number of clusters during our testing, rather than the true number of 50 unique signal types. However, the classification accuracy for this data set was 78.2 %, and the adjusted rand index was 0.776, indicating that signals of the same class were often clustered together. Scatter plots showing the unsupervised clustering of both live bird data sets are shown in Figure 5e,f. 50 Figure 5. Example scatter plots from unsupervised clustering of data sets. a) synthetic budgerigar data set with 20 unique elements, short duration, low harmonic content, and low background noise (clustered into 21 groups), b) synthetic long-billed hermit data set with 20 unique elements, short duration, high harmonic content, and low background noise (clustered into 20 groups), c) synthetic budgerigar data set with 50 unique elements, long duration, low harmonic content, and low background noise (clustered into 47 groups), d) synthetic long billed hermit data set with 50 unique elements, short duration, high harmonic content, and low background noise (clustered into 47 groups), e) lab data set of budgerigar calls with 15 unique elements (clustered into 15 groups), f) field data set of long billed hermit songs with 50 unique elements (clustered into 75 groups). We used t-sne dimensionality reduction technique to display all data points in two dimensions. Axes represent acoustic space, points represent single audio samples, and point colors and shapes represent samples of the same element type. 51 52 When acoustic area as used to estimate repertoire size, we observed a significant, positive correlation between acoustic area and the number of discrete elements. In addition, the acoustic area metric estimated repertoire size with similar accuracy across all values of true repertoire size (Figure 6). We observed this same pattern for synthetic data sets of long- billed hermit songs and budgerigar calls (Spearman correlation: budgerigars: r = 0.91, N = 99, p < 0.0001, long-billed hermits: r = 0.95, N = 99, p < 0.0001; Figure 6). Figure 6. Datasets with more discrete elements have larger distributions in acoustic space. As repertoire size increases, the distribution of samples in acoustic space occupies a larger area for a) synthetic budgerigar calls, b) synthetic long-billed hermit songs. Acoustic space values have been squared to better illustrate differences between values on a small scale. 53 DISCUSSION Our goal was to provide researchers with a flexible, unsupervised method for quantifying diversity in acoustic signals, a general problem encountered when evaluating the vocal repertoires of individuals, populations, or species. We aimed to replicate the process researchers might use when assessing variation in their own unlabeled data sets. We find that unsupervised learning paired with either cluster analyses or acoustic area calculations can approximate small and intermediate sample sizes well. In cases in which the number of discrete elements in a data set are large, however, quantifying the size of the area occupied in acoustic space may offer a more accurate alternative to estimating repertoire size than with cluster analyses. Below, we make specific recommendations about which signal characteristics might influence the accuracy of estimating acoustic diversity under different conditions, repertoire sizes, and acoustic features. Supervised random forest performance. Supervised random forest analyses allowed us to verify that random forest analysis can accurately identify underlying patterns in acoustic data. We assessed the efficacy of this process and confirmed that our test data sets had the expected structure. Our results suggest that signal duration (short vs. long) and harmonic content (low vs. high) largely do not affect classification accuracy in most cases (Figure 3). Interestingly, synthetic long-billed hermit songs that have low harmonic content or high background noise suffered from higher out-of-bag error. Additionally, in almost all cases, synthetic long-billed hermit songs exhibited higher out-of-bag error rates than synthetic budgerigar songs. A likely explanation is that the harmonic content of natural long-billed hermit songs provides physical acoustic structure that aids in classification among element types, and low power content in 54 harmonic bands of our synthetic songs, or high background noise may mask this helpful feature. Harmonic structure is known to encode individual identity in some species’ vocalizations (e.g., penguins, Aubin et al. 2000; humans, Imperl et al. 1997). The energy distribution of songs may be a salient feature that allows both conspecifics and automated approaches to better discern fine differences in signal structure. Therefore, harmonic structure is likely an important feature to capture in field recordings and feature measurements when it exists in natural vocalizations. As for the higher classification error for hermit elements in general, it is possible that the feature measurements we used might not be as effective at identifying the spectrotemporal variation for this species compared to budgerigars. Alternatively, focusing on frequency contours for representing variation in signal structure might miss other important features that help to distinguish between types, as the subtle variation in harmonic structure. Hence, is likely that our simulation underestimated the overall discriminatory power of the methods. For both classes of synthetic data sets, we observed that error rates increased with true repertoire size, suggesting that the method is less effective at finding structure in data when there are large numbers of discrete elements. This decrease in discriminatory power with increasing repertoire size might be due to a saturation of the acoustic space Unsupervised random forest performance. Cluster analysis using output from unsupervised random forest models showed that it was possible to estimate the true number of discrete elements in synthetic data sets with little error when the number of discrete elements was equal to or less than 20 (Figure 4a). For data sets that had 50 or 100 discrete elements, the unsupervised clustering technique often estimated repertoire size as being much higher than 55 its true value. One possible reason for this may be overfitting during clustering, i.e., when subsets of samples of the same signal type are assigned to separate clusters, which can occur when there is high similarity among a subset of samples in a class. Additionally, higher inaccuracy is expected as more unique elements are introduced when the acoustic space becomes saturated. Classification accuracy and adjusted rand indices were also higher for data sets with few discrete elements, and both metrics were consistently slightly higher for synthetic long-billed hermit data sets relative to synthetic budgerigar data sets (Figure 4b,d). This might be explained by the fact that the synthetic long-billed hermit exhibit more pronounced differences between classes than the synthetic budgerigar calls (Figure B1) which might allow for classes to be more easily distinguished. Our second approach of quantifying acoustic diversity by calculating the size of the acoustic area occupied in acoustic space avoids the issue of needing to assign signals to discrete classes. For synthetic budgerigar and long-billed hermit data sets, acoustic area was positively correlated with the number of discrete elements in a data set (Figure 6). Additionally, unlike the clustering approach, acoustic area estimates were robust to large repertoire sizes. We suggest that this may be a useful technique for quantifying diversity in species anticipated to have large repertoires or high element diversity, as it precludes the need for defining discrete categories which may be difficult to define statistically in a crowded acoustic space. We note, however, that making relative comparisons between different data sets requires that all data points are analyzed concurrently; acoustic area has no value or meaning on its own. Recently, researchers have suggested that using latent acoustic space created by compressing data into fewer dimensions could be a powerful way to cluster 56 discrete vocal signals (Goffinet et al. 2019, Sainburg et al. 2019), but to our knowledge no previous studies have assessed signal diversity by evaluating acoustic space occupancy. For the natural field and lab recorded data sets, we also observed limitations of the clustering method. Although cluster analysis accurately estimated small repertoire sizes with the synthetic data, for the lab-recorded budgerigar data set, which included only 15 unique element types, signals of the same class were sometimes placed in separate clusters. This could be one shortcoming of using clustering, as the algorithm may not assign the correct labels to every signal in a data set, although we observed that classification accuracy was rather high overall (79%). As with the synthetic data sets that had 50 unique elements, the unsupervised analysis overestimated the repertoire size of the field-recorded long-billed hermit data set of 50 elements, indicating that there were 75 unique elements present. Our results indicate that evaluating acoustic area is a more robust means of assessing acoustic diversity in such scenarios. The feature measurements that were most useful in the unsupervised random forest approach varied among test data sets, presumably because different signal types were best distinguished by different features (Table B1). The ability for the analysis to detect this latent variation without requiring us to specify a priori which features we expected to vary exemplifies one of the primary strengths of random forest analysis. For this reason alone, we expect this approach may permit a high degree of adaptability to diverse acoustic data sets. Overall, given the relatively low out-of-bag error rates, we were confident that constructing random forest models in an unsupervised manner would be a useful tool for assessing acoustic diversity. 57 Potential Uses. Both methods we tested allowed for accurate estimates of repertoire size, however, we see promising attributes and limitations of both approaches. As we pointed out, the cluster analysis was particularly useful for assessing small or intermediate repertoire sizes. Interestingly, previous work has shown that parrot repertoires often contain 10-15 elements (Bradbury, in press) and that most songbird repertoires typically include below 20 elements or song types (MacDougall-Shackelton 1997, Byers and Kroodsma 2009, Snyder and Creanza 2018). Repertoires can refer both to total signal repertoire in a species (signal ethogram), and total number of signals of a certain type within an individual (song repertoire or call repertoire). Here we evaluated performance with individual vocal elements, however, our proposed approach can potentially be applied in both scenarios. We suggest that both approaches can also be applied to address several ecological questions. Comparisons among species suggest acoustic diversity may correspond to a number of ecological characteristics, including viability of populations (Lailo et al. 2008), local habitat structure (Morton 1975, Boncoraglio and Saino 2007), as well as social system structure and complexity (Dunbar 1998, Freeburg 2006, elephants, Leighton 2017). Additionally, acoustic diversity within an individual, population, or species is also a key characteristic of animal vocal behavior that has been evaluated in terms of its role in social and sexual signaling (Tobias and Seddon 2009, Wilkins et al. 2013). We envision that acoustic space is an especially promising method to estimate and compare acoustic diversity across individuals, populations, or species. This method is especially well-suited for large comparative analyses in which little might be known ahead of time about repertoire sizes for individual species and whether they surpass the limit appropriate for cluster analysis. In addition, all species or individuals can be compared in the 58 same acoustic space, allowing for comparable estimates of acoustic area for all species. Lastly, automated procedures are especially beneficial for efficiency and reliability when comparing large numbers of species. Although the analyses presented here were conducted in a two-dimensional acoustic space, future analyses could calculate multi-dimensional acoustic volumes (as opposed to 2-D acoustic areas), although this is more computationally intensive. Challenges and limitations. The broader challenge of assigning signals to categories is expected to scale in difficulty as the number of classes increase and the acoustic space becomes saturated. This inherent challenge cannot be entirely avoided, but certain aspects of our technique can help to mitigate this issue, namely by considering how individual vocalizations occupy acoustic space rather than estimating repertoire size. Acoustic space may not linearly correlate with the number of discrete elements in a data set, but we can use this approach to capture differences between large versus small repertoires. Additionally, assessing acoustic space may also allow researchers to avoid evaluating acoustic niche in a manner that is not meaningful for a study species. Lastly, when using this approach, we recommend that researchers take care in collecting high quality recordings, make sufficient sampling effort to capture the full repertoire to be analyzed, and select features that can adequately capture variation in their data. Conclusions. We build upon previous work that has demonstrated the utility of unsupervised analyses for classifying acoustic signals and propose a novel combination of techniques for quantifying vocal diversity and/or measuring differences among individuals, species, and ecosystems. We propose that this method can be applied to estimate repertoire size and 59 calculating acoustic space occupancy, and both may be used to characterize vocalizations. By testing this method under diverse conditions and facilitating testing using synthetic data, we hope to offer researchers a robust and generalizable method for analyses of vocalizations. ACKNOWLEDGEMENTS We thank Holger Klinck, Chris Pelkie, the Cornell Center for Conservation Bioacoustics, and the Cornell Lab of Ornithology for essential support and technical advising while carrying out this project. This work was supported by funding from the Cornell Lab of Ornithology Athena Award, Cornell Sigma Xi research grants, and the Cornell Department for Neurobiology and Behavior. 60 WORKS CITED Acevedo, M. A., Corrada-Bravo, C. J., Corrada-Bravo, H., Villanueva-Rivera, L. J., and Aide, T. M. (2009). Automated classification of bird and amphibian calls using machine learning: A comparison of methods. Ecological Informatics, 4: 206-214. Anikin, A. (2019). Soundgen: An open-source tool for synthesizing nonverbal vocalizations. Behavior research methods, 51: 778-792. Araya-Salas, M., and Smith-Vidaurre, G. (2017). warbleR: an R package to streamline analysis of animal acoustic signals. Methods in Ecology and Evolution, 8: 184-191. Aubin, T., Jouventin, P., and Hildebrand, C. (2000). Penguins use the two–voice system to recognize each other. Proceedings of the Royal Society of London. Series B: Biological Sciences, 267:1081-1087. Beecher, M. D. (1989). Signaling systems for individual recognition - An information-theory approach. Animal Behaviour, 38: 248– 261. Boncoraglio, G., and Saino, N. (2007). Habitat structure and the evolution of bird song: a meta-analysis of the evidence for the acoustic adaptation hypothesis. Functional Ecology, 21: 134-142. Bormpoudakis, D., Sueur, J., and Pantis, J. D. (2013). Spatial heterogeneity of ambient sound at the habitat type level: ecological implications and applications. Landscape Ecology, 28: 495-506. Botero, C. A., Mudge, A. E., Koltz, A. M., Hochachka, W. M., and Vehrencamp, S. L. (2008). How reliable are the methods for estimating repertoire size? Ethology, 114: 1227- 1238. Briggs, F., Lakshminarayanan, B., Neal, L., Fern, X. Z., Raich, R., Hadley, S. J., Hadley, A. S., and Betts, M. G. (2012). Acoustic classification of multiple simultaneous bird species: A multi-instance multi-label approach. The Journal of the Acoustical Society of America, 131: 4640-4650. Briggs, F., Huang, Y., Raich, R., Eftaxias, K., Lei, Z., Cukierski, W., and Irvine, J. (2013). The 9th annual MLSP competition: new methods for acoustic classification of multiple simultaneous bird species in a noisy environment. In 2013 IEEE international workshop on machine learning for signal processing (MLSP). IEEE. Catchpole, C. K., and Slater, P. J. (2003). Bird song: biological themes and variations. Cambridge university press. Clark, C. W., Marler, P., and Beeman, K. (1987). Quantitative analysis of animal vocal phonology: an application to swamp sparrow song. Ethology, 76: 101-115. 61 Costa, B., Taylor, J. C., Kracker, L., Battista, T., and Pittman, S. (2014). Mapping reef fish and the seascape: using acoustics and spatial modeling to guide coastal management. PloS One, 9: e85555. Dahlin, C. R., Young, A. M., Cordier, B., Mundry, R., and Wright, T. F. (2014). A test of multiple hypotheses for the function of call sharing in female budgerigars, Melopsittacus undulatus. Behavioral ecology and sociobiology, 68(1), 145-161. Devoogd, T. J., Krebs, J. R., Healy, S. D., and Purvis, A. (1993). Relations between song repertoire size and the volume of brain nuclei related to song: comparative evolutionary analyses amongst oscine birds. Proceedings of the Royal Society of London. Series B: Biological Sciences, 254(1340), 75-82. Farabaugh, S. M., Linzenbold, A., and Dooling, R. J. (1994). Vocal plasticity in Budgerigars (Melopsittacus undulatus): evidence for social factors in the learning of contact calls. Journal of Comparative Psychology, 108(1), 81. Freeberg, T. M., and Lucas, J. R. (2012). Information theoretical approaches to chick-a-dee calls of Carolina chickadees (Poecile carolinensis). Journal of Comparative Psychology, 126: 68. Gasc, A., Sueur, J., Jiguet, F., Devictor, V., Grandcolas, P., Burrow, C., and Pavoine, S. (2013). Assessing biodiversity with sound: Do acoustic diversity indices reflect phylogenetic and functional diversities of bird communities? Ecological Indicators, 25, 279-287. Garamszegi, L. Z., Boulinier, T., Moller, A. P., Torok, J., Michl, G. and Nichols, J. D. (2002). The estimation of size and change in composition of avian song repertoires. Animal Behavior, 63, 623-630. Garamszegi, L. Z., Balsby, T. J. S., Bell, B. D., Borowiec, M., Byers, B. E., Draganoiu, T., Eens, M., Forstmeier, W., Galeotti, P., Gil, D., Gorissen, L., Hansen, P., Lampe, H. M., Leitner, S., Lontkowski, J., Nagle, L., Nemeth, E., Pinxten, R., Rossi, J. M., Saino, N., Tanvez, A., Titus, R., Torok, J., Van Duyse, E. and Muller, A. P. (2005). Estimating the complexity of bird song by using capture-recapture approaches from community ecology. Behavioral ecology and sociobiology, 57, 305—317. Gerhardt, H. C., and Huber, F. (2002). Acoustic communication in insects and anurans: common problems and diverse solutions. University of Chicago Press. Gibb, R., Browning, E., Glover-Kapfer, P., and Jones, K. E. (2019). Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring. Methods in Ecology and Evolution, 10: 169-185 Goffinet, J., Mooney, R., and Pearson, J. (2019). Inferring low-dimensional latent descriptions of animal 740 vocalizations. bioRxiv, 811661. 62 Harris, S. A., Shears, N. T., and Radford, C. A. (2016). Ecoacoustic indices as proxies for biodiversity on temperate reefs. Methods in Ecology and Evolution, 7: 713-724. Hershey, S., Chaudhuri, S., Ellis, D. P., Gemmeke, J. F., Jansen, A., Moore, R. C., and Slaney, M. (2017, March). CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp). IEEE. Imperl, B., Kačič, Z., and Horvat, B. (1997). A study of harmonic features for the speaker recognition. Speech communication, 22: 385-402. Janik, V. M. (2009). Acoustic communication in delphinids. Advances in the Study of Behavior, 40: 123-157. Kaufman, L., and Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis (Vol. 344). John Wiley and Sons. Keen, S., Ross, J. C., Griffiths, E. T., Lanzone, M., and Farnsworth, A. (2014). A comparison of similarity-based approaches in the classification of flight calls of four species of North American wood-warblers (Parulidae). Ecological Informatics, 21: 25-33. Kershenbaum, A., Freeberg, T. M., and Gammon, D. E. (2015). Estimating vocal repertoire size is like collecting coupons: a theoretical framework with heterogeneity in signal abundance. Journal of theoretical biology, 373: 1-11. Kroodsma, D. E., Miller, E. H., and Ouellet, H. (Eds.). (1982). Acoustic Communication in Birds: Song learning and its consequences (Vol. 2). Academic press. Kroodsma, D. E., and Miller, E. H. (Eds.). (1996). Ecology and evolution of acoustic communication in birds. Comstock Publishing. Laiolo, P., Vögeli, M., Serrano, D., and Tella, J. L. (2008). Song diversity predicts the viability of fragmented bird populations. PLoS One, 3: e1822. Langbauer, W. R., Payne, K. B., Charif, R. A., Rapaport, L., and Osborn, F. (1991). African elephants respond to distant playbacks of low-frequency conspecific calls. Journal of Experimental Biology, 157: 35-46. Leighton, G. M. (2017). Cooperative breeding influences the number and type of vocalizations in avian lineages. Proceedings of the Royal Society B: Biological Sciences, 284: 20171508. Linhart, P., Osiejuk, T. S., Budka, M., Šálek, M., Špinka, M., Policht, R., and Blumstein, D. T. (2019). Measuring individual identity information in animal signals: Overview and performance of available identity metrics. Methods in Ecology and Evolution, 10: 1558-1570. 63 Lyon, R. H., and Ordubadi, A. (1982). Use of cepstra in acoustical signal analysis. Journal of Mechanical Design, 104: 303-306. Maaten, L. V. D., and Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9: 2579-2605. Marler, P. R. (1982). Avian and primate communication: The problem of natural categories. Neuroscience and Biobehavioral Reviews, 6: 87-94. Mason, N. A., Burns, K. J., Tobias, J. A., Claramunt, S., Seddon, N., and Derryberry, E. P. (2017). Song evolution, speciation, and vocal learning in passerine birds. Evolution, 71: 786- 796. Michie, D., Spiegelhalter, D. J., and Taylor, C. C. (1994). Machine learning. Neural and Statistical Classification, 13. Mullen, S. P., Mendelson, T. C., Schal, C., and Shaw, K. L. (2007). Rapid evolution of cuticular hydrocarbons in a species radiation of acoustically diverse Hawaiian crickets (Gryllidae: Trigonidiinae: Laupala). Evolution, 61: 223-231. Owren, M. J., Seyfarth, R. M., and Hopp, S. L. (1992). Categorical vocal signaling in nonhuman primates. Studies in emotion and social interaction. Nonverbal vocal communication: Comparative and developmental approaches, 102-122 Payne, R. B. (1986). Bird songs and avian systematics. In Current ornithology. Springer, Boston, MA. Peshek, K. R., and Blumstein, D. T. (2011). Can rarefaction be used to estimate song repertoire size in birds? Current Zoology, 57: 300-306. Pieretti, N., Farina, A., and Morri, D. (2011). A new methodology to infer the singing activity of an avian community: The Acoustic Complexity Index (ACI). Ecological Indicators, 11: 868-873. Pijanowski, B. C., Villanueva-Rivera, L. J., Dumyahn, S. L., Farina, A., Krause, B. L., Napoletano, B. M., and Pieretti, N. (2011). Soundscape ecology: the science of sound in the landscape. BioScience, 61: 203-216. Podos, J., Lahti, D. C., and Moseley, D. L. (2009). Vocal performance and sensorimotor learning in songbirds. Advances in the Study of Behavior, 40: 159-195. Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66: 846-850. Ryan, M. J. (1985). The túngara frog: a study in sexual selection and communication. University of Chicago Press. 64 Sainburg, T., Thielk, M., and Gentner, T. Q. (2019). Latent space visualization, characterization, and generation of diverse vocal communication signals. bioRxiv, 870311. Salamon, J., Jacoby, C., and Bello, J. P. (2014). A data set and taxonomy for urban sound research. In Proceedings of the 22nd ACM international conference on Multimedia, 1041- 1044. Sewall, K. B., Soha, J. A., Peters, S., and Nowicki, S. (2013). Potential trade-off between vocal ornamentation and spatial ability in a songbird. Biology Letters, 9: 20130344. Seyfarth, R. M., and Cheney, D. L. (2003). Signalers and receivers in animal communication. Annual review of psychology, 54, 145-173. Smith-Vidaurre, G., Araya-Salas, M., and Wright, T. F. (2019). Individual signatures outweigh social group identity in contact calls of a communally nesting parrot. Behavioral Ecology, 31: 448-458. Stowell, D., and Plumbley, M. D. (2014). Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ, 2: e488. Stowell, D., Wood, M. D., Pamuła, H., Stylianou, Y., and Glotin, H. (2019). Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge. Methods in Ecology and Evolution, 10: 368-380. Sueur, J., Pavoine, S., Hamerlynck, O., Duvail, S., (2008a). Rapid acoustic survey for biodiversity appraisal. PLoS One, 3: e4065. Sueur, J., Aubin, T., and Simonis, C. (2008b). Seewave, a free modular tool for sound analysis and synthesis. Bioacoustics, 18: 213-226. Sueur, J., Farina, A., Gasc, A., Pieretti, N., and Pavoine, S. (2014). Acoustic indices for biodiversity assessment and landscape investigation. Acta Acustica united with Acustica, 100: 772-781. Sullivan-Beckers, L. and Cocroft, R.B. (2010) The importance of female choice, male-male competition, and signal transmission as causes of selection on male mating signals. Evolution, 64: 3158–3171 Tobias, J.A. and Seddon, N. (2009) Signal design and perception in Hypocnemis antbirds: evidence for convergent evolution via social selection. Evolution, 63: 3168–3189 Ulloa, J. S., Aubin, T., Llusia, D., Bouveyron, C., and Sueur, J. (2018). Estimating animal acoustic diversity in tropical environments using unsupervised multiresolution analysis. Ecological Indicators, 90: 346-355. 65 Valletta, J. J., Torney, C., Kings, M., Thornton, A., and Madden, J. (2017). Applications of machine learning in animal behaviour studies. Animal Behaviour, 124: 203-220. Wadewitz, P., Hammerschmidt, K., Battaglia, D., Witt, A., Wolf, F., and Fischer, J. (2015). Characterizing vocal repertoires—Hard vs. soft classification approaches. PloS one, 10: e0125785. Wildenthal, J. L. 1965: Structure in primary song of the mockingbird (Mimus polyglottos). The Auk 82: 161-189. Wilkins, M. R., Seddon, N., and Safran, R. J. (2013). Evolutionary divergence in acoustic signals: causes and consequences. Trends in ecology and evolution, 28: 156-166. Wolberg, G. (1990). Digital image warping, Vol. 10662. Los Alamitos, CA: IEEE computer society press. Wrege, P. H., Rowland, E. D., Keen, S., and Shiu, Y. (2017). Acoustic monitoring for conservation in tropical forests: examples from forest elephants. Methods in Ecology and Evolution, 8: 1292-1301. 66 CHAPTER 3 PATTERNS OF VOCAL CONVERGENCE ARE SHAPED BY OPPOSING FORCES: EVIDENCE FROM WILD SONGBIRDS Sara Keen1,2 1 Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 2 Cornell Lab of Ornithology, 159 Sapsucker Woods Rd, Ithaca, NY ABSTRACT Vocal convergence is a widespread phenomenon that occurs in many species. To date, a universal set of factors or conditions that lead to convergence has not been identified. A central challenge in predicting when vocal convergence is expected is that both ecological and social environment can influence the benefits an individual may gain from convergence. Here, I develop and test a series of models that incorporate ecological and social context in order to predict whether signal convergence or divergence is expected in a given system, as well as the optimal level of convergence or divergence an individual will exhibit. I also consider the specific case of movement between populations in order to predict the optimal amount of convergence effort for immigrants and residents. I test these predictions using empirical data collected from a wild population of great tits, Parus major, a species known to exhibit song sharing among neighbors. The convergence model predicts that that there will be partial, but 67 not full, vocal convergence among neighbors, and that immigrants, not residents, should make effort to converge with neighbors’ songs. This is premised upon the hypothesis that, immigrants, who initially exhibit low vocal similarity with residents, can significantly increase their fitness by converging with residents, whereas residents can only marginally increase their fitness by converging with a newly arrived immigrant. We found that levels of acoustic similarity were higher among neighbors sharing territory boundaries, but that neighbors did not exhibit complete convergence, supporting model predictions. Using playbacks of non-local songs to simulate the arrival of immigrants, we found that residents did not exert convergence effort, as predicted by the model. We consider how our model might help to explain patterns of signal convergence in many species, and suggest that vocal convergence may be a dynamic, context-dependent trait which can be shaped by an individual’s current environment. INTRODUCTION In many animal societies, cooperative and competitive interactions are mediated by an individual’s ability to signal group membership or identity (Sherman et al. 1997, Tibbetts and Dale 2007, Bradbury and Verhencamp 2011). In many taxa, vocal signals encode information about population or group membership, which is often conveyed by highly similar signal structure among group members (e.g., parrots, Wright 1996; songbirds, Catchpole and Slater 2008; hummingbirds, Gaunt et al. 1994; cetaceans, Deecke et al. 2000, Garland et al. 2011; bats, Boughman 1997). The phenomenon of nearby conspecifics converging upon similar vocalizations has received considerable attention from researchers (Wright and Dahlin 2018), and several hypotheses have been proposed to explain the emergence and maintenance of 68 such geographical variation (Payne 1982, Baker and Cunningham 1985, Podos and Warren 2007). Although vocal convergence is widespread, the observed degree of convergence and the spatial scale on which it occurs varies among species, and no single suite of ecological or social factors have been found to explain its presence or absence (Podos and Warren 2007, Wright and Dahlin 2018). Bird song is a useful trait in which to explore vocal convergence as songs and calls often exhibit within-species geographic variation (Catchpole and Slater 2008). Dialects, in which nearby conspecifics converge upon acoustically similar signals that vary widely among locations, have been observed in dozens of species since first documented in white-crowned sparrows by Marler and Tamura (1964), and have prompted a number of researchers to posit explanations for the evolution of vocal convergence (Baker and Cunningham 1985, reviewed by Podos and Warren 2007 and Wright and Dahlin 2018). These hypotheses aim at explaining why vocal convergence is favored by selection, and fall into three, non-mutually exclusive classes. The first class can broadly be described as the social benefits hypothesis, which suggests that individuals benefit from using vocalizations similar to those of nearby birds because the social costs of using nonlocal signals are high. This encompasses the colony password hypothesis (Feekes 1977), which suggests that dialects signal social group membership and serve to identify nonlocal intruders, as well as the deceptive mimicry hypothesis (Payne 1982), which proposes that less dominant or nonlocal individuals adopt local songs used by dominant birds in order to gain acceptance and deter competitors. The social benefits hypothesis suggests that, in addition to helping individuals gain group acceptance, using local signal types can also facilitate cooperation, group cohesion, or coordinated behavior with particular individuals (Wilkinson and Boughman 1998, 69 Vehrencamp et al. 2003, Janik and Slater 1998). The second class of hypotheses suggests that convergence could be shaped by sexual selection acting through mate choice. In this scenario, potential mates would exhibit a preference for vocalizations used in their population, as this could signal that individuals were born nearby and may therefore be better adapted to the local environment (i.e., coadapted gene complex; Baker 1982). Therefore, using local signal types could offer a selective advantage through enhanced mate attraction, as has been shown in several species in which females prefer local signal types (e.g., Searcy et al. 2002, Lachlan et al. 2014). The third class of hypotheses suggest that vocal convergence may arise because particular signals are better adapted to the local environment (i.e., the acoustic adaptation hypothesis, Morton 1975). In this case, local signal types confer a selective advantage through more effective transmission to receivers. Levels of transmission loss are known to vary with habitat characteristics (Bradbury and Vehrencamp 2011), and several studies of bird song in urban areas demonstrate that signal structure is under selection to best fit the environment (e.g., Slabberkoorn and Peet 2003, Luther and Derryberry 2012). A final possibility that could serve as a null hypothesis is that geographical variation arises through cultural drift (i.e., the epiphenomenon hypothesis, Wiens 1982). For example, copying errors can give rise to distinct song variants in different populations, and a particular variant may be adopted by a population not because it offers some selective advantage, but because it is the model to which many juveniles are exposed. In this case, convergence offers no selective advantage and is analogous to random genetic drift with local fixation. The many hypotheses proposed to explain vocal convergence reflect not only the sustained interest in this phenomenon, but also the large amount of variation in patterns of vocal convergence observed among species. For example, vocal convergence can take the 70 form of sharing a single call type as well as partial or full overlap of song repertoires, and the timescale on which vocal convergence persists can vary between weeks to decades (Kroodsma 2004). Furthermore, although many of the hypotheses above specify the mechanisms by which convergence would be expected to arise, the exact predictions made by each will depend on species characteristics, such as dispersal distance, repertoire size, and spatial distribution of breeding territories (Baker and Cunningham 1985, Podos and Warren 2007). Many past studies offer support for the social benefits hypothesis (Wright and Dahlin 2018), though at present there appears to be no single hypothesis that provides a universal explanation for vocal convergence. Although this broad phenomenon may not be described by a single theory, researchers agree that vocal convergence is shaped by a complex suite of social and ecological factors. Here, we suggest that by considering these factors as parameters in an optimality model, we can estimate the expected amount of vocal convergence within a given population. In order to make predictions about the conditions under which vocal convergence is expected, it is important to differentiate vocal convergence from counter singing. The former is a population-level process that persists over a period of time on the order of weeks or longer. In contrast, counter singing, or vocal matching, is an interactive display that often occurs between two individuals, typically during an agonistic encounter such as the negotiation or defense of territory boundaries (McGregor et al. 1992, Vehrencamp 2001, King and McGregor 2016). Like vocal convergence, counter singing entails matching the signals of others, and indeed the need to counter sing with competitors may be among the suite of factors that gives rise to vocal convergence or song sharing (Macdougall-Shackelton 1997, Nelson 2000). However, because vocal matching occurs in the context of aggressive 71 interactions, whereas vocal convergence is the resulting pattern observed across a group or population, these processes are functionally dissimilar and therefore yield different predictions. In other words, vocal convergence is an ontogenetic process, and counter singing serves an adaptive function in the moment. Central to understanding the processes that shape vocal convergence is determining the fitness advantage an individual stands to gain by resembling nearby conspecifics. In addition to the hypotheses above, which suggest that individuals may increase their fitness via vocal convergence, we must also consider scenarios in which individuals benefit from being dissimilar to nearby conspecifics. Dissimilar signals are known to be advantageous in social systems in which individual recognition is important for avoiding aggression that is intended for others (Dale et al. 2001, Sheehan and Tibbetts 2009) and has been suggested to enhance offspring recognition (Medvin et al. 1993). Signal divergence might also be favored by sexual selection, for example when signaling individual identity offers a selective advantage in mate attraction (Thom et al. 2012). In this case, individuals are subject to some penalty if they do not sufficiently differentiate themselves from other members of the population. Additionally, sexual selection might lead to divergence when signals serve as honest indicators of quality, leading to divergent signals among signalers of varying quality. Examples of this may include cases in which using more elaborate songs or larger repertoires increases access to mating opportunities (e.g., Read and Weary 2002, Snell-Rood and Badyaev 2008). To evaluate the costs and benefits of using vocal signals that are similar or dissimilar to those of nearby conspecifics under different conditions, we first consider two models that represent alternative scenarios: selection for signal convergence and selection for signal divergence. Here, the term divergence refers to signal dissimilarity, and thus selection for 72 divergence would result in greater overall diversity within a population. For each case, we estimate the amount of effort that a focal individual should invest in converging with or diverging from nearby individuals in the same group or population (hereafter neighbors) in order to maximize their fitness, and list the specific hypotheses generated from model in Table 1. We also present a combined model, in which there is opposing selection for both convergence and divergence, that can be used to predict which scenario (convergence or divergence) is expected given the characteristics of a particular system. Regardless of whether selection favors convergence or divergence, the predicted degree of signal similarity is expected to be mediated by social and ecological factors, i.e., to be context-specific. Context- specific tuning of signal similarity is a new way in which social recognition systems can be seen as flexible, just as receiver acceptance thresholds for recognition signals have been shown vary widely among interaction contexts (Reeve 1989, Johnstone 1997, Sheehan and Reeve 2020). The models below aim to describe the expected degree of vocal convergence or divergence in each scenario given the particular social and ecological context in which signaling occurs. 73 Table 1. Summary of model hypotheses and predictions for our study system. 74 Model Hypotheses Predictions 1. Males breeding nearby one another Individual fitness costs are converge upon similar vocal incurred when a focal signals. individual is very different from 2. Immigrants will change their Convergence neighbors, and therefore non- signals to converge with those of local individuals newly arrived neighbors. in a population will bear the 3. Local birds will exert no effort to costs of vocal convergence. adopt nonlocal playback songs into their repertoires. Individuals increase their fitness 1. Males breeding nearby one another by distinguishing themselves will not have higher vocal from neighbors. Non-local similarity than other birds within Divergence individuals that use very the population. dissimilar signal incur no 2. Immigrants will exhibit less vocal fitness costs for not matching similarity with neighbors than the local population. resident males 75 Table 1 (Continued) Model Hypotheses Predictions The unique system characteristics will determine if predictions match those of Individuals experience the convergence or divergence model. Combined opposing selective forces. This is decided by relative values of the convergence and divergence parameters, cc and cd, respectively. 76 I test the predictions from each model using a dataset of songs collected from a wild population great tits (Parus major). Because males of this species disperse from natal territories to establish breeding territories during their first year (Krebs 1981), it was possible to assess whether levels of pairwise acoustic similarity between males correlates with geographic distance between their natal nests and/or breeding territories, which may give insight into whether there is selection for vocal convergence when individuals establish territories. I also evaluated whether birds were more acoustically similar to neighbors than non-neighbors, and whether levels of acoustically similarity varied among resident birds born locally and immigrant birds born outside of the study system. Lastly, I used playbacks to test whether simulating the arrival of a nonlocal male on a neighboring territory prompted birds to incorporate nonlocal songs into their repertoires. My results below support the predictions of the convergence model: neighboring males used more similar songs than expected by chance, and their songs are more similar to songs of current neighbors than to songs of natal neighbors. I also observed that immigrant birds exhibited the same levels of vocal convergence as residents, and that birds did not adopt nonlocal songs used in playbacks. Together, my results suggest that individuals benefit from matching neighbors’ signals and that immigrants show higher likelihood of investment in vocal convergence than residents. These findings also suggest that the observed levels of vocal convergence may reflect the tradeoff between benefits of convergence and the costs of acquiring new signals. The models we present here may help to explain patterns of vocal convergence found in different systems and to unify observations under a general framework. 77 METHODS Models. I propose three models describing the expected level of vocal similarity between a focal individual and their neighbors under selection for convergence, divergence, or opposing selection for both outcomes. For each case, I predict the optimal amount of convergence or divergence effort an individual should exert given possible values of the parameters listed in Table 2. 78 Table 2. Model parameters and variables. 79 Parameter Definition Explanation and examples or variable The fitness of the focal individual, which can increase when the individual exerts effort for convergence or Focal divergence under the appropriate conditions. Examples 𝜔 individual’s of fitness benefits due to signal convergence or fitness divergence include increased access to mating opportunities and acceptance into a social group that enables increased access to resources. The initial distance between the signal of the focal individual and the signals of all neighbors. The manner in which distance is calculated depends on the vocal Acoustic 𝒹 characteristics of the system in question. For example, distance among birds that have repertoires of multiple song types, 𝒹 might represent the amount of song sharing between individuals. The amount by which 𝒹 increases or decreases due to Change in effort for convergence or divergence made by the focal 𝑥 acoustic individual. For example, 𝑥 could reflect the reduction in distance acoustic distance resulting from an immigrant bird acquiring a local song type. 80 Table 2 (Continued) Parameter Definition Explanation and examples or variable This parameter controls the degree to which a given Cost of convergence or divergence effort is costly. This convergence 𝑎 includes energetic costs (e.g., the physiological costs of or divergence signal production), and learning costs (e.g., time effort required to acquire new songs). This term defines how quickly social benefits increase Sensitivity of as the focal individual’s signal becomes increasingly social benefits similar to or dissimilar from neighbors’ signals. A to 𝒸𝒸 , 𝒸𝒹 social benefit can result from a reduction in some kind convergence of social cost, e.g., increased viability as the chance of or divergence misdirected aggression received from a neighbor effort decreases. 81 The convergence model (1) describes the relationship between the focal individual’s fitness, 𝜔, and the reduction in distance between their signal and neighbors’ signals due to positive convergence effort, 𝓍, when there is selection for signal convergence. 𝜔 = 𝑒*𝒸𝒸∗(𝒹*-)/ ∗ 𝑒*0- (1) The first exponential term on the right side of (1) describes the social benefit of convergence, e.g., the 1-probability of the focal individual receiving injurious aggression from its neighbor as a function of the focal individual’s effort in reducing their acoustic distance. The initial acoustic distance between the focal individual’s signal and collective neighbors’ signals is represented by 𝒹, and the term (𝒹 − 𝑥)2 is this distance after some conformity effort, 𝑥, is made by the focal individual. Thus, the form of (1) encodes the assumption that the social benefit of convergence is maximized when x = d, i.e. the acoustic distance is reduced to zero. The second exponential term represents the focal individual’s loss in survival or fecundity from expending effort x (time or energy) in reducing the acoustic difference. Thus, the fitness-maximizing value of the effort x must be one that optimally balances the social benefit of convergence and the direct cost of convergence effort. As the cost of conformity effort, 𝑎, increases, the maximum possible fitness decreases, as is expected in cases where the acquisition of new signals is costly, e.g., when reducing acoustic distance requires precisely adjusting fine structure of signals, and therefore a large time investment in vocal learning. Additionally, as the sensitivity to mismatch, 𝒸𝒸, increases, the amount of signal convergence necessary to achieve the maximum possible fitness increases, as would be expected when receivers are more discerning of differences among signals. The peak fitness of the focal individual occurs when 34 = 0 and 𝜔 is a local 3- 82 maximum. We solve for 𝑥 in 34 (2) to find the optimal reduction in acoustic distance, x*, 3- shown in (3). 34 = −𝑒*𝒸𝒸∗(𝒹*-)/*0- ∗ (𝑎 + 2𝑐𝒸 −𝑑 + 𝑥 ) (2) 3- 𝑥∗ = *0 + 𝒹 (3) 2𝒸9 We can also use the solution for x in (3) to verify that the second derivative is negative, confirming that the solution above is the fitness maximum. Thus, the optimal convergence effort increases as the initial acoustic distance d increases, the effort cost rate, a, decreases, and as the sensitivity of social benefits to acoustic distance, cc, increases. An immediate consequence is that the focal individual should try harder to converge the greater the initial acoustic distance and the more potent the social benefit of convergence (Fig. 1a,b). The divergence model (4) describes the change in a focal individual’s fitness, 𝜔, as a function of divergence effort, 𝓍. 𝜔 = 𝑒*𝒸𝒹/(𝒹;-)/ ∗ 𝑒*0- (4) The first exponential term in (4) describes the social benefit of divergence, e.g., increased mating opportunities or reduced aggression intended for others as a focal individual becomes increasingly distinct from neighbors. The term (𝒹 + 𝑥)2 is the signal distance from collective neighbors’ signals after some divergence effort, 𝑥, to increase acoustic distance beyond the starting value of 𝒹. The value of the first exponential term increases as 𝑥 increases, which reflects the model assumption that social benefits increase as the focal individual increases 83 divergence effort. Consequently, the model predicts that fitness increases as divergence effort increases, i.e., when acoustic distance between the focal individual and neighbors grows larger. As in the previous model, the second exponential term in (4) represents the costs of divergence effort, such as time or energy invested in acquiring or producing divergent signals. The focal individual’s fitness will be maximized when the tradeoffs between costs of divergence effort and social benefits of divergence are optimized. To find this fitness maximum, we calculate 34 in (5), and solve for 𝑥∗ in (6). 3- < 𝒸𝒹 34 = −𝑒 / * 0- (𝒹=>) ∗ (−𝑎 + 2 𝒸𝒹 ?) (5) 3- 3;- @/? @/? 𝑥∗ = 2 𝒸𝒹@/? − 𝒹 (6) 0 The solution for the fitness maximum in (6) shows that the optimal divergence effort decreases as the initial signal distance, 𝒹, increases and as the cost of divergence, 𝑎, increases. Additionally, when the effort cost, 𝑎, is small relative to the sensitivity, 𝒸𝒹, the focal individual will exert more divergence effort as the sensitivity increases, as would be expected when receivers are increasingly better at distinguishing between similar signals and the costs of divergence effort are negligible. Therefore, this model predicts that individuals will exert more divergence effort when initial acoustic distance is smaller and the ratio cd to a is higher (Fig. 1c,d). 84 The combined model (7) describes individual fitness when there is opposing selection for both convergence and divergence. We assume that there are two multiplicative social benefits, one due to convergence benefits and one due to divergence benefits, and ignore the direct cost term for simplicity. < 𝒸𝒹 𝜔 = 𝑒*𝒸𝒸∗(𝒹*-)/ ∗ 𝑒 / (𝒹<>) (7) Here, the term 𝑥 is the change in signal distance resulting from the focal individual’s effort, which may result in being either more similar to or more different from neighbors (i.e., x in this model is allowed to be either positive or negative, with a positive value indicating convergence and a negative value indicating divergence). This model describes a scenario in which an individual experiences two multiplicative social benefits, one maximized by signal convergence and the other by signal divergence. Thus, the relative values of 𝒸𝒹 and 𝒸𝒸 influence the focal individual’s optimal strategy. For example, when 𝒸𝒸 is large and 𝒸𝒹 approaches zero, the second exponential term in (7) can be ignored and the focal individual’s fitness is maximized when 𝑥 approaches 𝑑, i.e., when there is effort for convergence. We seek to find the conditions under which the product of the social benefits should lead to positive x (convergence) or negative x (divergence). We first find the derivative of (7) with respect to x, yielding 34 = 𝑒*𝒸𝒸∗(𝒹*-)/*𝒸𝒹/(𝒹*-)/ ∗ ( *2𝒸𝒹 ? + 2 ∗ 𝑐𝑐 ∗ (𝑑 − 𝑥) (8) 3- 3*- 85 We then evaluate the sign of this derivative for x = 0, which tells us whether selection at an acoustic difference d should cause subsequent increases in x (convergence) or decreases in x (divergence). The result is that convergence should be favored over divergence as long as 𝑐𝑐 > 93 3C (9) Thus, convergence should occur when cc is large relative to cd, and the initial acoustic distance, d, is small. Suppose that there is stronger selection for convergence than for divergence (i.e., 𝒸𝒸 >> 𝒸𝒹) so that the pure convergence model described above applies. Suppose further that an immigrant enters into and acquires a territory in a stable population. If cc >> cd, it follows that all individuals in the local population, hereafter residents, have previously attained some optimal amount of convergence with one another. For the case where the residents have negligible signal distance with each other, let the immigrant differ from each of the residents by an initial distance, 𝒹. In this case, the immigrant’s fitness as a function of its own convergence effort x is given by: 𝜔 = (𝑒*𝒸𝒸∗(𝒹*-)/)F ∗ 𝑒*0-DEE (10) Here, 𝑛 is the number of neighbors (i.e., residents) surrounding the immigrant and the fitness expressions that it now must survive costly interactions with each of the n residents. Using the approaches above, the immigrant’s optimal convergence effort is equal to 86 𝑥∗ = 𝑑 − 0 (11) 299F Therefore, an immigrant should exert more effort to converge the local signal type (i.e., 𝑥 must more closely approach 𝒹) when there are more residents (Fig. 2a). We can also describe the fitness of a resident in relation to their change in signal distance after a single immigrant arrives as a function of its own effort x to converge with the immigrant: 𝜔 *𝒸𝒸∗(𝒹*-)/ *𝒸𝒸∗(-)/ F*K *0-HIJ = 𝑒 ∗ (𝑒 ) ∗ 𝑒 (12) In this case, a resident whose signal approaches an immigrant’s signal by a distance x gains convergence benefits with the immigrant, but lowers its convergence benefits from the other n-1 residents because it is has increased its signal distance by x with the signals of the other residents. Using the approaches above, the resident’s optimal convergence effort is equal to: 𝑥∗ = 3 − 0 (13) F 299F A comparison of the immigrants and resident’s optimal convergence efforts in (11) and (13), respectively, reveals that the optimal convergence effort for the resident is lower than that of immigrant, particularly when n is high. In fact, for high enough n, the resident will not be favored to exert any convergence effort, as the arrival of a single immigrant into a population of many residents does not significantly alter the signal space in which residents exist. These differences in optimal convergence effort of immigrants and residents are illustrated in Fig. 2. 87 An assumption in all models is that selection acts upon the amount of signal similarity between the focal individual and neighbors. Therefore, it is expected that there will be variation in amount of vocal convergence focal individuals exhibit, but does not necessitate that individuals can actively adjust the amount of effort they invest in convergence or mismatching. For example, when considering vocal signals used by territorial songbirds, the mechanism underlying convergence effort may be song acquisition via post-dispersal social learning, intentional settlement near territory holders using similar songs, or it could be that individuals exhibit natal philopatry and therefore can maximize convergence by remaining near their natal nest. In all models, fitness costs result from a combination of ecological and social factors, and thus representing these using a single term allows our models to be broadly applicable. I created plots of model predictions using MATLAB 2015a (The Mathworks Inc., Natick, MA, USA). 88 Figure 1. Convergence and divergence model predictions. The fitness of focal individuals (y-axis) corresponds to the change in signal distance from neighbors due to effort from the focal individual, 𝑥 (x-axis). Under selection for convergence, as the sensitivity to mismatch, 𝒸𝒸, increases, higher levels of convergence effort, 𝑥, are required to achieve maximum fitness, and higher initial signal distance, 𝒹, reduces fitness maxima, as shown in a) 𝒹 = 5, and b) 𝒹 = 15. Under selection for divergence, higher sensitivity to mismatch, 𝒸𝒹, requires higher levels of divergence effort, 𝑥, to reach maximum possible fitness, and higher initial signal distance, 𝒹, increases fitness maxima, as shown in c) 𝒹 = 1, and d) 𝒹 = 5. In all plots, the initial distance, 𝒹, between the focal individual’s signal and neighbors’ signals is shown by the dotted black line, and fitness maxima are indicated with dashed lines. 89 Figure 2. Convergence model predictions for immigrants and residents. As the number of neighbors increases, a) immigrants will exert more effort to converge with residents’ signals (shown with a = 0.1), and b) residents will exert less effort to match immigrants (shown with a = 0.01). The initial distance, 𝒹, between the focal individual’s signal and neighbors’ signals is shown by the dotted black line, and fitness maxima are indicated with dashed lines. Study system. I collected songs from a wild population of great tits in Wytham Woods, Oxfordshire, UK (51460 N, 01200 W). This population is part of a long-term breeding study that annually monitors great tits using nest boxes placed within the woods. Great tits are preferential cavity nesters, and nearly all individuals breeding in Wytham Woods use nest boxes that are monitored by field assistants during annual data collection. Birds in this study are fitted with standard British Trust for Ornithology (BTO) metal leg bands as well as Passive Integrated Transponders (PIT tags), which were used to identify individuals in this study. Great tits are year-round residents and begin claiming territories in February or March each year, approximately four weeks before the onset of breeding (Firth and Sheldon 2015). Juvenile male great tits acquire songs from nearby adults and are thought to be able to acquire 90 new songs after dispersal to breeding territories in their first year (Rivera-Gutierrez et al 2011). Males use repertoires that include between one to nine unique song types, which they use during dawn chorus displays during the breeding system (McGregor and Krebs 1982). Using data from the long-term study, I identified all focal birds as residents, dispersers, or immigrants. Residents were defined as any birds that were born in Marley Plantation (the region of Wytham Woods where data were collected for this study) or any bird that had bred previously within Marley Plantation. Dispersers were defined as birds born within or having previously bred within Wytham Woods, but outside of Marley Plantation. Immigrants were defined as birds that were not born in and had not previously bred within Wytham Woods. Birds that could not be identified using BTO rings or PIT tags were classified as unknown. Using nest monitoring data collected by field assistants, I identified all nest boxes occupied by great tits in the study area, and mapped territories using Thiessen polygons, which has been shown to closely match the regions occupied by breeding males in this system (Wilkin et al. 2007). I defined all pairs of birds sharing a territory boundary as neighbors. Data collection. I collected recordings at 54 great tit nests between 30 March and 15 May in 2017-2019 (N = 21, 16, 17 for each year) using Swift acoustic recorders (Cornell Center for Conservation Bioacoustics, Ithaca, NY). The recorders collected sounds from focal nests continuously from 0500-0900 daily and saved recordings in as WAV files with a 32kHz sampling rate and 16-bit precision. I analyzed the songs used by focal birds during dawn chorus displays on three consecutive mornings, as this has been shown to be sufficient to sample an individual’s complete repertoire (Rivera-Gutierrez et al. 2011). I ensured that the 91 mornings sampled from each individual were collected either during the egg laying period of the mate of the focal male or within five days before the onset of egg laying, as this is the period of peak dawn chorus output (Mace 1987). I analyzed only songs used in males’ dawn chorus display, which was defined as any songs produced within 90 min of civil twilight (sensu Mace 1987). This ensured the dataset did not include songs used during counter- singing. I used Raven Pro 1.5 (Cornell Bioacoustics Research Program, 2014) to generate spectrograms of recordings with a Hann window function, 1024-point Fourier transforms, and 50% window overlap. A research assistant trained to identify great tit songs created separate Raven selections for all dawn chorus songs. Using the technique described in Keen et al. (in prep), songs from an individual male were classified as distinct song types. I then selected the five samples of each song type produced by a single bird that had the highest signal-to-noise ratio, and used this representative subset of songs to obtain acoustic measurements for every bird. Acoustic analysis. In order to calculate the amount of vocal similarity among birds, I collected several acoustic measurements of all songs in the dataset. These included 28 spectro-temporal measurements from every recorded song using the WarbleR R package (Araya-Salas and Smith-Vidaurre, 2017), 181 descriptive statistics of Mel Frequency cepstral coefficients (MFCCs; Lyon and Ordubadi 1982, sensu Salamon et al. 2014), as well measurements generated from similarity matrices produced by spectrogram cross correlation (Clark et al. 1987) and dynamic time warping (Wolberg 1990). The song measurement vectors were collated into a single matrix for each bird. Using the method described in Chapter 2, I assigned every recorded song into a class representing a distinct song type, and 92 selected the 15 songs with highest signal-to-noise ratio from each class. I then averaged all measurements from this subset of songs from a single bird, resulting in one measurement vector per individual. Finally, I calculated vocal differences between pairs of individuals as Euclidean distance between these vectors, and hereafter refer to this value as pairwise acoustic distance. To determine whether geographic distance between birds is correlated with acoustic similarity, I used Mantel tests to compare pairwise acoustic similarity with pairwise distance between breeding territories. For resident birds born within Wytham Woods, I used a separate Mantel test to compare pairwise acoustic similarity with pairwise distance between natal nests. This allowed me to evaluate whether birds exhibited vocal convergence with nearby conspecifics after dispersing to breeding territories, which suggest that there is context- dependent selection pressure for vocal convergence. To calculate levels of vocal convergence among neighbors, I first calculated the mean pairwise acoustic similarity between a focal individual and all of its neighbors, as well as the mean pairwise acoustic similarity between that individual and all non-neighbors. This was repeated for every bird in the dataset. Non-neighbors were defined as all birds recorded within the same season that did not share a territory boundary with the focal bird. To test whether vocal converge among neighbors was higher than among non-neighbors, I used a linear mixed model (LMM) with pairwise acoustic similarity as a response variable, pairwise relationship (neighbors or non-neighbors) as a fixed effect, and reference bird as a random effect. To evaluate whether the levels of acoustic similarity with neighbors and non-neighbors was different for residents, immigrants, and dispersers I compared acoustic similarity with 93 neighbors and acoustic similarity with non-neighbors using two one-way ANOVAs and Tukey post hoc tests to make comparisons among birds in from each class. Novel song playbacks. In order to test the convergence model predictions for residents and immigrants, I used a playback experiment to simulate the arrival of an immigrant bird into an established population of residents. I conducted ten playbacks using unique recordings of non- local great tit songs at ten locations within the study system (Fig. 3). All song recordings used in playbacks were acquired from the online archive Xeno-Canto (xeno-canto.org). I selected songs that were recorded in mainland Europe and ensured that songs did not resemble those used by local birds. I created ten unique MP3 sound files using Audacity 2.3.1 (audacityteam.org). Playback sounds comprised 1 minute of a novel song type sung repeatedly followed by 10 seconds of silence, repeated 15 times. I programmed AGPTEK A02 MP3 players to automatically play MP3 files at 0900, 0930, 1000, 1030, and 1100 daily for five consecutive days. All MP3 players were connected to ANKER SoundCore 2 Bluetooth speakers which were placed on an empty neighboring territory and facing in the direction of the focal nest. For all replicates, speakers were approximately 100 m from the focal nest. I placed Swift acoustic recorders at focal nests approximately 5 days before playbacks began. The recorders collected data continuously from 0500-0900 as in the song analysis described above. 94 Figure 3. Playback locations. Black dots represent nest boxes in the study site and red stars indicate focal nests at which playbacks took place. To determine whether focal birds adjusted their singing behavior after the onset of playbacks, I compared songs used during the three mornings preceding playbacks to the songs used on the third, fourth, and fifth mornings after playbacks began. Songs used by focal birds were identified within recordings and analyzed following the procedure described above. I tested for changed in songs used by focal males before and after playbacks using three approaches. First, I visually inspected spectrograms to determine whether focal birds added or removed songs from their repertoires after playbacks began. Second, I calculated the acoustic distance between the playback songs used at a focal nest and all songs used by the focal bird. I 95 found the means of these distances for songs used before and after the start of playbacks for all focal nests. To test whether there was an overall effect in singing behavior for all birds in the experiment, I used an LMM with acoustic distance as a response variable, order (before or after) as a fixed effect, and nest as a random effect. Lastly, I used separate LMMs for all focal birds to test the effects of playbacks on singing behavior, using acoustic distance as a response variable, order (before or after) as a fixed effect, and date of recording as a random effect. I conducted all analyses in R (R Core Team 2015) and used LmerTest package for model analysis (Kuznetsova et al. 2015). RESULTS I found that pairwise acoustic distance was correlated with distance between breeding nests (Mantel test: correlation = 0.28, p = 0.004 N= 54 pairs; Fig 4a), and did not find a significant correlation between pairwise acoustic similarity and distance between natal nests (Mantel test: correlation = 0.01, p = 0.43, N= 21 pairs; Fig 4b). Birds in the study exhibited significantly higher acoustic similarity with neighbors as compared to non-neighbors (LMM: t = 3.02, df =54.9, p = 0.003; Fig 4c), although the observed levels of acoustic similarity did not suggest complete vocal convergence (acoustic similarity between focal bird and neighbors (mean ± SE): 0.72 ± 0.02, focal bird and non-neighbors: 0.64 ± 0.02). There was not a significant difference in the amount of acoustic similarity that residents, immigrants, and dispersers exhibited with neighbors (ANOVA: F(2) = 0.55, p = 0.58; Tukey posthoc test: immigrants vs. residents: p = 0.56, dispersers vs. residents: p = 0.7, dispersers vs. immigrants: p = 0.99) or non-neighbors (ANOVA: F(2) = 1.01, p = 0.37; Tukey posthoc test: immigrants vs. residents: p = 0.91, dispersers vs. residents: p = 0.36, dispersers vs. immigrants: p = 0.45; Fig. 4d). 96 I collected sufficient recordings for analysis at eight of the ten playback locations. Visual inspection of spectrograms showed that focal birds did not adopt novel songs used in playbacks. One individual added a song to their repertoire after playbacks began that had not been used previously, although this song did not resemble the playback song. This individual was also the only bird in our sample to show significant changes in singing after the start of playbacks (Fig. 5, Table 3). I observed that several focal birds adjusted the ratios with which they used different songs in their repertoires, but playback timing (before or after) had no detectable influence on these adjustments (Table 3). 97 Figure 4. a) Pairwise acoustic similarity is significantly correlated with distance between breeding territories, b) Pairwise acoustic similarity is not significantly correlated with distance between natal nests, (c) focal birds have significantly higher pairwise acoustic similarity with neighbors versus non-neighbors, d) There is not a significant difference in the amount of acoustic similarity residents, immigrants, and dispersers exhibit with either neighbors or non- neighbors. 98 Figure 5. Birds did not adjust songs after playbacks simulating immigrant arrival. The mean acoustic distance of focal birds’ repertoires from playback songs, shown on the y-axis, did not significantly change after playbacks began (t = 1.24, p = 0.21). Black dots represent acoustic means of focal birds (N = 8) and grey lines indicate measurements from the same bird. 99 Table 3. Individual results for playback replicates. I used order (before or after) as a fixed effect and recording day as random effect, and found that only the bird in replicate 2 significantly changed songs after the onset of playbacks. Standard Replicate Estimate t p error 1 0.028 0.024 1.18 0.24 2 0.082 0.032 2.55 0.011 3 -0.018 0.044 -0.41 0.69 4 0.081 0.059 1.38 0.17 5 0.0 0.017 -0.03 0.98 6 0.0 0.015 0.05 0.96 7 0.052 0.042 1.22 0.22 8 0.012 0.011 1.09 0.28 DISCUSSION Although many studies have investigated vocal convergence, the factors that determine when convergence is expected and to what extent individuals will converge are poorly understood. The models presented here make it possible to predict these outcomes for a given social and ecological context and to consider how context changes may lead to changes in vocal behavior. I find that both my observations of vocal similarity among resident birds and my results from playback experiments simulating immigrant arrival are most consistent with the convergence model predictions. Below, I discuss the implications of these results and suggest that this model could be generalized to explain signal convergence in many scenarios. 100 Empirical tests of model predictions. I analyzed songs used during the breeding season in a population of great tits, during which male territory defense occurs (Falls et al. 1982, McGregor et al. 1992). If a defending male confronts a newly settled territorial neighbor that has a novel song (high acoustic difference) compared to the other established territorial neighbors, the latter male might be seen as an enhanced threat for a territory take-over, so there might be an advantage for the male with the novel song to converge its song with those of the other established territorial neighbors. The converging male could thereby reduce the chance of receiving mistaken costly aggression. This sets up a benefit for social convergence, and this benefit will be strong given the presence of multiple territorial male neighbors. However, past studies also suggest that females prefer males that use some unfamiliar song types (McGregor and Krebs 1982) as well as males with larger repertoires (Baker et al. 1986), which often correlates with using more unshared songs (Keen et al. in prep). Thus, to the extent that females are receivers, one might expect some sexually-selected divergence among male songs, opposing the selection pressures favoring convergence at least to some degree. Thus, in this study population, one might expect that senders optimally should make some convergence effort but do not fully match neighbors. This is seen in my results: birds sharing territory boundaries had higher levels of acoustic similarity than non-neighbors, but did not approach full convergence. Additionally, I find a negative correlation between pairwise acoustic similarity and distance between current nests, but do not find this relationship when considering natal nests. This suggests that birds exert effort to converge with current neighbors, and thus adapt to changing social contexts. This effort might include 101 acquiring songs that are acoustically similar to neighbors’ songs after dispersing, or searching for neighbors with similar songs before territory establishment. I find further support for context-dependent convergence when making comparisons between residents and immigrants. Previous work shows that great tits from different populations often use acoustically dissimilar songs (Rivera-Gutierrez et al. 2010), suggesting that upon arrival into a new population, immigrants have higher initial acoustic distance from neighbors than residents. The model predicts that because immigrants and residents experience identical social benefits of convergence, they will have the same optimal level of convergence relative to neighbors’ signals. Therefore, immigrants and residents are predicted to exhibit rapidly the same levels of vocal convergence with neighbors, although achieving this will be costlier for immigrants. This prediction would also apply to dispersers that arrive from other regions of the study system. In accordance with model predictions, I find that that residents, dispersers, and immigrants did not differ significantly in levels of acoustic similarity with neighbors. In other words, all individuals exhibited similarly high levels of vocal similarity with neighbors regardless of immigration status. This is in agreement with the model prediction that the optimal level of vocal convergence is a fixed distance from neighbors’ signals, regardless of the initial acoustic distance between the focal individual and neighbors. Unfortunately, because few birds were resampled in subsequent years or sampled at both the beginning and end of the breeding season, it was not possible to evaluate the change in immigrant vocal behavior over time. The observational analysis allowed me to evaluate the study population after convergence effort was made by focal individuals, which presumably happens before or during territory establishment. In contrast, the playback experiment made it possible to test 102 predictions about changes in convergence effort in response to changing contexts. For the specific scenario of an immigrant arriving into a population of established residents, the convergence model predicts that a resident will exert little or no effort to converge with an immigrant, because doing so would then maladaptively diverge with its other neighbors. This was supported by findings that resident focal birds did not adopt highly novel playback songs and exhibited little or no change in acoustic distance with the novel playback song. Implications and broader relevance. Taken together, the empirical results suggest that breeding great tit males adjust their singing behavior in the presence of neighboring territorial males in accordance with the convergence model. This model also predicts that low initial signal distance and low costs to convergence can facilitate the emergence of dialects. Why, then, do great tits not exhibit dialects? One possible explanation is that the sensitivity to convergence is relatively low in this species, which may correspond to the tradeoff between territory defense and mate attraction mentioned above. This presents an interesting implication of our model: when using a single mode of communication, a signaler’s peak fitness may be lower than the peak fitness that could be attained by using multimodal communication, some components of which exhibit high convergence and others of which exhibit high divergence. This therefore suggests that multimodal signaling may be favored in contexts where senders must communicate with multiple receivers. We can also consider possible outcomes in species that experience different magnitudes of social benefits and effort costs. For example, when signaling social group membership strongly influences survival because it enables group recognition, our model predicts high levels of convergence because social benefits and sensitivity are high. This 103 aligns with evidence of vocal convergence in lekking birds (hermit hummingbirds, Kapoor 2016), group-living species (e.g., budgerigars, Farabaugh et al. 1994), and cooperative breeders (e.g., wood hoopoes, Radford 2005; superb starlings, Keen et al. 2013; western bluebirds, Akcay et al. 2014). Similarly, vocal convergence with mates may augment social benefits in pair bonding species, as has been observed budgerigars (Hile et al. 2000), crossbills (Sewall 2009), and ravens (Luef et al. 2017). This framework may also be applied to non-vocal signals. For example, social insects are known to identify nest mates using cuticular hydrocarbons and are highly sensitive to differences in hydrocarbon profiles (Hölldobler and Wilson 2009). Given the high sensitivity to convergence and high social benefits of signaling group membership, our model predicts that fitness is maximized when signal distance from neighbors approaches zero. This prediction is supported by evidence that nest mates exhibit colony-specific hydrocarbon profiles that result from contact between individuals and nest materials, and are highly similar among nest mates (Lenoir et al. 1999). A different example from social insects is the use of facial patterning in social wasps, which has been shown to signal individual identity in at least one species (Tibbetts 2002). Within wasps that form dominance hierarchies, individuals that are more distinct from others receive less misdirected aggression (Sheehan and Tibbetts 2009), which aligns with the predictions of the divergence model (Dale et al. 2001). Another example in which divergence is favored is found in sciurid rodents, which have been shown to use more individually distinct vocal signals when living in larger groups because receivers must discriminate among higher numbers of senders (Pollard and Blumstein 2011). We can also consider the case of egg mimicry in brood parasites. In cases where host defenses enable receivers to be very discerning, brood parasites show high levels of convergence in egg 104 appearance with hosts (Spottiswoode and Stevens 2011), as predicted by high sensitivity in the convergence model. We could use the same approach to describe the occurrence of mimicry to avoid aggression, or, more broadly, the maintenance of cultural norms. Contextual importance. My results suggesting that complete convergence is non-optimal could seem inconsistent with studies showing that song matching is prevalent among territorial songbirds. Although one function of convergence is likely song matching during counter-singing (Bradbury and Vehrencamp 2011), I suggest that this is not mutually exclusive from the model predictions. This can be explained by considering that counter singing occupies a particular context because it is a different type recognition and communication. Song matching might serve the purpose of allowing birds to assess fighting ability, e.g. by evaluating song consistency (Byers 2007, Botero et al. 2009, Rivera-Gutierrez et al. 2010). In song matching, it is also expected that neighbors have the option to not match opponents so that encounters can be de-escalated (Beecher and Campbell 2005). Thus, neighbors are not predicted to exhibit complete repertoire sharing, which is in line with model predictions. This clarification may help to explain observations of rufous-and-white wrens, a species in which both sexes sing, but primarily males engage in counter singing, meaning that sexes experience different contexts. Males in this species exhibit only partial vocal convergence with neighbors, but levels of convergence among males were significantly higher than in females (Graham et al. 2017). The model could represent this difference as higher sensitivity to convergence among males. A similar explanation of sensitivity changing with context could be applied to reports of individuals adjusting levels of vocal convergence 105 in different social environments, such as Diana monkeys showing higher convergence with groupmates when non-group members are nearby (Candiotti et al. 2012). In addition to social context, we might also consider how ecological context could influence levels of vocal convergence. For example, certain individuals might be more likely to occupy territories with particular habitat characteristics, as in the case of high quality individuals holding territories in high quality habitat. In this scenario, individuals of similar quality might cluster together within a population. Therefore, when signals encode individual quality and there is variation in habitat quality within a system, neighbors may be more likely to use similar signals than non-neighbors. Although my study took place in a homogeneous habitat and thus territory quality was not included in my analysis, future work may benefit from considering this factor. Future directions. The most appropriate tests of model predictions would be experimental manipulations of the signals to which a focal individual is exposed, e.g., a relocation to a distant population in which conspecific signals are very different. In this case, the convergence model would predict that the relocated focal individual (i.e., the immigrant) will adopt signals used in the local environment, rather than residents adjusting their signals. It would also be possible to test predictions with a “natural experiment” the follows focal birds from their natal sites to dispersal sites or with a meta-analysis of past observational studies. Conclusions. Patterns of vocal convergence cannot be universally explained by any one set of social or ecological factors. The proposed framework may help to explain observed patterns of signal convergence versus divergence in many taxa by considering how these factors 106 determine each individual’s costs and benefits for convergence versus divergence. My empirical results for great tits align with convergence model predictions, both in the case of population-wide levels of song similarity, and in local interactions between residents and immigrants. ACKNOWLEDGEMENTS I thank H. Kern Reeve for his essential guidance during the development and writing of this paper. Thank you to Ana Verahami, Dallas Jordan, Benjamin Walton, Keith McMahon, Sam Crofts for their enthusiasm and help during data collection and analysis. This project was made possible by hardware and computing support provided by Holger Klinck and the Cornell Center for Conservation Bioacosutics. This project was supported by the Cornell Lab of Ornithology Athena Fund, the Edward Gray Institute for Field Ornithology, and funding from the Cornell Department of Neurobiology and Behavior. I am also grateful for many helpful suggestions during Cornell’s Animal Behavior Lunch Bunch. 107 WORKS CITED Akçay, Ç., Hambury, K. L., Arnold, J. A., Nevins, A. M., and Dickinson, J. L. (2014). Song sharing with neighbours and relatives in a cooperatively breeding songbird. Animal Behaviour, 92: 55-62. Araya-Salas, M. and Smith-Vidaurre, G. (2017). warbleR: An R package to streamline analysis of animal acoustic signals. Methods in Ecology and Evolution, 8: 184-191. Baker, M. C. (1982). Genetic population structure and vocal dialects in Zonotrichia (Emberizidae). Acoustic communication in birds, 2: 209-235. Baker, M. C., and Cunningham, M. A. (1985). The biology of bird-song dialects. Behavioral and Brain Sciences, 8: 85–133. Baker, M. C., Bjerke, T. K., Lampe, H., and Espmark, Y. (1986). Sexual response of female great tits to variation in size of males' song repertoires. The American Naturalist, 128: 491- 498. Botero, C. A., Rossman, R. J., Caro, L. M., Stenzler, L. M., Lovette, I. J., de Kort, S. R., and Vehrencamp, S. L. (2009). Syllable type consistency is related to age, social status and reproductive success in the tropical mockingbird. Animal Behaviour, 77: 701-706. Bradbury, J. W., and Vehrencamp, S. L. (2011). Principles of animal communication. 2nd. Sunderland, Massachusetts: Sinauer. Byers, B.E. (2007). Extra-pair paternity in chestnut-sided warblers is correlated with consistent vocal performance. Behavioral Ecology, 18: 130-136. Candiotti, A., Zuberbühler, K., and Lemasson, A. (2012). Convergence and divergence in Diana monkey vocalizations. Biology Letters, 8: 382-385. Clark, C. W., Marler, P., and Beeman, K. (1987). Quantitative analysis of animal vocal phonology: an application to swamp sparrow song. Ethology, 76: 101-115. Dale, J., Lank, D. B., and Reeve, H. K. (2001). Signaling individual identity versus quality: a model and case studies with ruffs, queleas, and house finches. The American Naturalist, 158: 75-86. Deecke, V. B., Ford, J. K., and Spong, P. (2000). Dialect change in resident killer whales: implications for vocal learning and cultural transmission. Animal behaviour, 60: 629-638. Farabaugh, S. M., Linzenbold, A., and Dooling, R. J. (1994). Vocal plasticity in Budgerigars (Melopsittacus undulatus): evidence for social factors in the learning of contact calls. Journal of Comparative Psychology, 108: 81. 108 Feekes, F. (1977). Colony-specific song in Cacicus cela (Icteridae, Aves): The password hypothesis. Ardea, 65: 197–202. Firth, J. A., and Sheldon, B. C. (2015). Experimental manipulation of avian social structure reveals segregation is carried over across contexts. Proceedings of the Royal Society B: Biological Sciences, 282: 20142350. Garland, E. C., Goldizen, A. W., Rekdahl, M. L., Constantine, R., Garrigue, C., Hauser, N. D., and Noad, M. J. (2011). Dynamic horizontal cultural transmission of humpback whale song at the ocean basin scale. Current biology, 21: 687-691. Gaunt, S. L., Baptista, L. F., Sanchez, J. E., and Hernandez, D. (1994). Song learning as evidenced from song sharing in two hummingbird species (Colibri coruscans and C. thalassinus). The Auk, 111: 87-103. Graham, B. A., Heath, D. D., and Mennill, D. J. (2017). Dispersal influences genetic and acoustic spatial structure for both males and females in a tropical songbird. Ecology and evolution, 7: 10089-10102. Hile, A. G., Plummer, T. K., and Striedter, G. F. (2000). Male vocal imitation produces call convergence during pair bonding in budgerigars, Melopsittacus undulatus. Animal Behaviour, 59: 1209-1218. Hölldobler B. and Wilson, E. O. (2009). The superorganism: the beauty, elegance, and strangeness of insect societies. New York, NY: WW Norton and Company. Janik, V. M., and Slater, P. J. (1998). Context-specific use suggests that bottlenose dolphin signature whistles are cohesion calls. Animal behaviour, 56: 829-838. Johnstone, R. A. (1997). Recognition and the evolution of distinctive signatures: when does it pay to reveal identity? Proceedings of the Royal Society of London. Series B: Biological Sciences, 264: 1547-1553. Kapoor, V. (2016). The Functional Significance of Microgeographic Dialects in a Hermit Hummingbird. PhD Dissertation, Cornell University. Keen, S. C., Meliza, C. D., and Rubenstein, D. R. (2013). Flight calls signal group and individual identity but not kinship in a cooperatively breeding bird. Behavioral Ecology, 24: 1279-1285. Kuznetsova, A., Brockhoff, P. B., and Christensen, R. H. B. (2015). Package ‘lmertest’. R package version 2.0. Lachlan, R. F., Anderson, R. C., Peters, S., Searcy, W. A., and Nowicki, S. (2014). Typical versions of learned swamp sparrow song types are more effective signals than are less typical versions. Proceedings of the Royal Society B: Biological Sciences, 281: 20140252. 109 Lenoir, A., Fresneau, D., Errard, C., and Hefetz, A. (1999). “The individuality and the colonial identity in ants: the emergence of the social representation concept,” in Information Processing in Social Insects, eds C. Detrain, J. L. Deneubourg, and J. Pasteels. Basel: Birkhauser, 219–237. Luef, E. M., Ter Maat, A., and Pika, S. (2017). Vocal similarity in long-distance and short- distance vocalizations in raven pairs (Corvus corax) in captivity. Behavioural processes, 142: 1-7. Luther, D. A., and Derryberry, E. P. (2012). Birdsongs keep pace with city life: changes in song over time in an urban songbird affects communication. Animal Behaviour, 83: 1059- 1066. Lyon, R. H., and Ordubadi, A. (1982). Use of cepstra in acoustical signal analysis. Journal of Mechanical Design, 104: 303-306. Macdougall-Shackleton, S. A. (1997). Sexual selection and the evolution of song repertoires. In Current ornithology (pp. 81-124). Springer, Boston, MA. Marler, P., and Tamura, M. (1964). Culturally transmitted patterns of vocal behavior in sparrows. Science, 146: 1483-1486. McGregor, P. K., and Krebs, J. R. (1982). Mating and song sharing in the great tit. Nature. 297: 60-61. McGregor, P. K., Dabelsteen, T., Shepherd, M. and Pedersen, S. B. 1992. The signal value of matched singing in great tits: evidence from interactive playback experiments. Animal Behaviour, 43: 987–998 Nelson, D. A. (2000). Song overproduction, selective attrition and song dialects in the white- crowned sparrow. Animal Behaviour, 60: 887-898. Payne, R. B. (1982). Ecological consequences of song matching: breeding success and intraspecific song mimicry in indigo buntings. Ecology, 63: 401-411. Podos, J., and Warren, P. S. (2007). The evolution of geographic variation in birdsong. Advances in the Study of Behavior, 37: 403-458. Pollard, K. A., and Blumstein, D. T. (2011). Social group size predicts the evolution of individuality. Current Biology, 21: 413-417. Radford, A. N. (2005). Group-specific vocal signatures and neighbour–stranger discrimination in the cooperatively breeding green woodhoopoe. Animal Behaviour, 70: 1227- 1234. 110 Reeve, HK. (1989). The evolution of conspecific acceptance thresholds. The American Naturalist, 133: 407–435. Rivera-Gutierrez, H. F., Matthysen, E., Adriaensen, F., and Slabbekoorn, H. (2010). Repertoire sharing and song similarity between great tit males decline with distance between forest fragments. Ethology, 116: 951–960. Rivera-Gutierrez, H. F., Pinxten, R., and Eens, M. (2011). Difficulties when assessing birdsong learning programmes under field conditions: a re-evaluation of song repertoire flexibility in the great tit. PLoS One, 6. Salamon, J., Jacoby, C., and Bello, J. P. (2014). A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM international conference on Multimedia. 1041- 1044. Searcy, W. A., Nowicki, S., Hughes, M., and Peters, S. (2002). Geographic song discrimination in relation to dispersal distances in song sparrows. The American Naturalist, 159: 221–230. Sewall, K. B. (2009). Limited adult vocal learning maintains call dialects but permits pair- distinctive calls in red crossbills. Animal Behaviour, 77: 1303-1311. Sheehan, M. J., and Tibbetts, E. A. (2009). Evolution of identity signals: frequency-dependent benefits of distinctive phenotypes used for individual recognition. Evolution: International Journal of Organic Evolution, 63: 3106-3113. Sheehan, M.J. and H.K. Reeve. (in press). Evolutionary stable investments in recognition systems explain patterns of discrimination failure and success. Philosophical Transactions B: Biological Sciences. Sherman PW, Reeve HK, Pfennig DW, Krebs JR, Davies NB. (1997) Recognition Systems. In Behavioural ecology: an evolutionary approach, Oxford: Blackell Science Ltd. Snell-Rood, E. C., and Badyaev, A. V. (2008). Ecological gradient of sexual selection: elevation and song elaboration in finches. Oecologia, 157: 545-551. Spottiswoode, C. N., and Stevens, M. (2011). How to evade a coevolving brood parasite: egg discrimination versus egg variability as host defences. Proceedings of the Royal Society B: Biological Sciences, 278: 3566-3573. Thom, MDF, Dytham C. Female chosiness leads to the evolution of individually distinctive males. Evolution: International Journal of Organic Evolution, 66: 3736-3742. 111 Tibbetts, E. A. (2002). Visual signals of individual identity in the wasp Polistes fuscatus. Proceedings of the Royal Society of London. Series B: Biological Sciences, 269: 1423-1428. Vehrencamp, S. L. (2001). Is song–type matching a conventional signal of aggressive intentions? Proceedings of the Royal Society of London. Series B: Biological Sciences, 268: 1637-1642. Vehrencamp, S.L., Ritter, A.F., Keever, M. and Bradbury, J.W. (2003). Responses to playback of local vs. distant contact calls in the orange-fronted conure, (Aratinga canicularis). Ethology, 109: 37-54 Wiens, J. A. (1982). Song pattern variation in the sage sparrow (Amphispiza belli): Dialects or epiphenomena?. The Auk, 99: 208-229. Wilkin, T. A., Perrins, C. M., and Sheldon, B. C. (2007). The use of GIS in estimating spatial variation in habitat quality: a case study of lay-date in the Great Tit Parus major. Ibis, 149: 110-1188. Wilkinson, G. S, Boughman, J. W. (1998). Social calls coordinate foraging in greater spear- nosed bats. Animal Behaviour, 55:337–350. Wolberg, G. (1990). Digital image warping. Vol. 10662. Los Alamitos, CA: IEEE computer society press. Wright, T. F. (1996). Regional dialects in the contact call of a parrot. Proceedings of the Royal Society of London. Series B: Biological Sciences, 263: 867-872. Wright, T. F., and Dahlin, C. R. (2018). Vocal dialects in parrots: patterns and processes of cultural evolution. Emu-Austral Ornithology, 118: 50-66. 112 CHAPTER 4 SPATIAL AND TEMPORAL VARIATION IN SONGS IN A WILD POPULATION OF GREAT TITS, PARUS MAJOR Sara Keen1,2 1 Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 2 Cornell Lab of Ornithology, 159 Sapsucker Woods Rd, Ithaca, NY ABSTRACT Vocal communication plays a crucial role in mediating social interactions in many taxa. Oftentimes among species that acquire vocalizations via social learning, vocal signals exhibit dynamic spatial variation thus signal structure may fluctuate on relatively short time scales. The songs of great tits, Parus major, have a simple, stereotyped structure, often comprising a single repeated two-note phrase. Across this species’ continent-wide distribution, songs exhibit a similar structure and there is no evidence for distinct geographic dialects, though the particular song types found within a population may vary over time and space. Here, I analyze songs collected from a wild population of great tits in three consecutive years in order to investigate factors that might drive spatial and temporal variation in song on a microgeographic scale. I observed that the relative abundance of song types used in the population changed between years, and found no evidence that certain song types are used more often than expected by random chance. These results are consistent with a previous study conducted in this population 40 years prior, and upon comparing my dataset to earlier 113 records, I found that only a small proportion of previously recorded song types persist today. Although the likelihood of a song type being present in the population was positively correlated with the number of birds using the song in the preceding year, the appearance or disappearance of song types between years was nearly always explained by the appearance or disappearance of individual birds. Notably, all individuals in the study used at least one form of the common two-note song, and immigrant birds were more likely than residents to have more complex song types in their repertoires. I found that birds occupying breeding territories on forest edges used larger repertoires and shared fewer song types with the population than birds breeding in more central territories. Additionally, birds that shared few songs with the local population began breeding earlier and had larger clutches, which may be a consequence of breeding on higher quality territories. Together, these results suggest that the spatial distribution of song types may be influenced by a suite of factors, including competition for breeding territories and mates, immigration, individual survival, and nest-site characteristics. Lastly, I suggest that although there are high levels of spatial and temporal variation in song on a microgeographic scale, the ubiquity of common two-note song types may facilitate high levels of song similarity on a macrogreogaphic level. INTRODUCTION In many species, vocal communication plays an essential role in helping individuals navigate complex social environments. Bird song is a well-studied signal that is known to be a key determinant of survival and reproductive success (Catchpole and Slater 2008). In addition to being widely used for mate attraction and territory defense, songs can simultaneously signal a number of singer characteristics, including species, group and/or individual identity, and 114 individual quality (Searcy and Andersson 1986, Kroodsma and Byers 1991, Bradbury and Vehrenamp 2011). Owing to the fact that the songs are used to compete with or display to nearby conspecifics, and that signal transmission is mediated by habitat characteristics, it is critical that birds use songs which are best fit for both their social and ecological environment (Morton 1975, Hunter and Krebs 1979, Payne 1981, Nelson and Marler 1994, Vehrencamp 2001). However, this may be a moving target. Frequently, the songs which an individual should use to maximize their fitness are dependent on the precise location and timing of song production (Nelson 1992, Slabbekoorn and Smith 2002, Nordby et al. 2007). Several factors have been shown to influence temporal and spatial variation in songs, including sexual and/or natural selection, mating system, movement between populations, learning strategy, and repertoire size (Ellers and Slabbekoorn 2003, Kroodsma 2004, Derryberry 2009, Fayet 2014). Consequently, a continuum of vocal convergence is observed across species, ranging from relatively stable songs shared over large geographic scales, to songs that change from year to year and are shared by few individuals (Podos and Warren 2007, Wright and Dahlin 2018). Within passerine birds, nearby conspecifics often converge upon similarly structured songs, which can lead to intra-specific geographic variation and the emergence of dialects (Baker and Cunningham 1985). Dialects often arise in species that have short range dispersal and in which juveniles learn songs by imitating nearby adults (Marler and Tamura 1962, Slabbekoorn and Smith 2002, Slater 1989). Territoriality has also been identified as a key ecological correlate among passerine species that exhibit geographic variation in songs, likely because the maintenance of territory boundaries often leads to song-sharing between adjacent males (McGregor and Krebs 1989, Kroodsma 2004, Beecher and Brenowitz 2005). A number of studies have proposed possible functions of vocal convergence, including enabling 115 recognition of group members and excluding non-members (Feekes 1977), facilitating assortative mating with individuals that are most fit for the local habitat (Marler and Tamura 1962), or encouraging adaptation to local environments (Slabbekoorn 2004). In all cases, the benefits an individual receives from resembling conspecifics and the amount of vocal convergence required to obtain benefits will be determined by unique ecological and social characteristics of the population (see Chapter 3). Among avian species that use repertoires comprising several song types, geographic variation in songs may take the form of song sharing as well as high levels of similarity in acoustic structure of songs (McGregor and Krebs 1989, Nelson 1992, Beecher and Brenowitz 2005, Rivera-Gutierrez et al. 2010a). Because songbirds typically acquire songs via copying rather than innovation, they are usually limited to using songs of conspecific singers which were overheard during the sensitive period for vocal learning, although some variation may be introduced through copying errors (Slater 1989, Ellers and Slabbekoorn 2003, Slater and Lachlan 2003, Beecher and Brenowitz 2005). Unlike repertoire composition, repertoire size is thought to be a sexually selected trait in many species (Searcy 1992, MacDougall-Shackleton 1997, Catchpole and Slater 2008). An individual’s ability to learn and produce multiple song types may correlate with age, experience, or developmental health, and therefore may be an honest indicator of quality in several species (Searcy and Nowicki 2005, Catchpole and Slater 2008, but see Byers and Kroodsma 2009). However, in species with small repertoires, the range of songs that an individual can produce is ultimately limited by learning opportunities and innate constraints (Gil and Gahr 2002), and oftentimes repertoire composition is further refined through selective attrition to best suit an individual’s current social milieu (Marler and Peters 1982, Lachlan et al. 2018). 116 Movement and connectivity between populations can profoundly influence geographic variation in songs (Podos and Warren 2007). High levels of connectivity between populations, in the form of immigration, dispersal, or other manners of gene flow, have been shown to correlate with increased vocal similarity (MacDougall-Shackleton and MacDougall- Shackleton 2001). Inversely, lack of movement between populations is linked to increased vocal divergence, and previous studies have demonstrated that this may act as a barrier to reproduction and may promote speciation (Slabbekoorn and Smith 2002, Price 2008, Freeman and Montgomery 2017). However, among species that acquire vocalizations through social learning, geographic variation in song can often be maintained despite gene flow between populations (Ellers and Slabbekoorn 2003, Wright et al. 2005). Therefore, spatial patterns of variation do not always accurately reflect connectivity between populations, particularly among species that are capable of acquiring songs both before and after dispersal (Podos and Warren 2007). The relationship between individual movement within and between populations and microgeographic variation in song is not well understood, and few studies have explored how these processes might in turn influence spatial and temporal variation in songs on a larger scale. To address these questions, I investigated variation in songs collected from a wild population of great tits (Parus major) in three consecutive years. Males of this species have repertoires comprising two to nine unique song types, with each song being a series of identical, repeated phrases composed of a unique combination of notes (Krebs 1976, Krebs 1977; Figure 1). Throughout their extensive range, which spans Europe and extends into the Middle East and parts of North Africa (BirdLife International 2020), great tits are known to use stereotyped songs composed of a repeated two-note phrase, often referred to with the 117 mnemonic “teacher” (Alexander 1935, Zollinger et al. 2017). Although many variants of this song exist, the distinctive pattern of two alternating high and low frequency notes is commonly used for species identification (Thomas 2019). Great tits also use song variants composed of phrases with three to five notes, but these account for a smaller proportion of songs observed in this species (McGregor and Krebs 1989). Past studies have suggested that habitat can influence the acoustic structure of great tit songs, with forest-dwelling birds having lower frequency songs than woodland birds (Hunter and Krebs 1979), and birds in urban areas singing at higher frequencies and using shorter notes than birds in rural areas (Slabbekoorn and Peet 2003, Slabbekoorn and den Boer-Visser 2006). Multiple studies have also demonstrated that repertoire size is positively correlated with survival and lifetime reproductive success (McGregor et al. 1981, Lambrechts and Dhont 1986, Rivera-Gutierrez et al. 2010b), and that song sharing between males decreases with distance between territories (McGregor and Krebs 1982, Rivera-Gutierrez et al. 2010a). 118 Figure 1. Spectrograms of great tit songs collected from the study population. Vertical dashed lines represent separations between sound files. Each sound file contains a different song type. Songs in the top two rows are composed of two-note phrases. Songs in the final row are composed of phrases with more than two notes and are termed complex songs in my analysis. Complex songs are infrequently shared with other birds in the study population. Spectrograms were generated using Raven Pro 1.5 using the parameters described below. I analyzed songs used by tagged great tit males in consecutive years in order to determine the relative abundance of song types within the population and the extent to which these distributions varied over time and space. Using these data, I explore the how individual song characteristics vary with immigration status and reproductive success and investigate the role of individual movement and survival on microgeographic variation in songs. I also compare these findings with a previous study of songs in this population conducted by McGregor and Krebs (1982). Finally, I suggest possible explanations for the observed patterns in my data and make recommendations for future studies. 119 METHODS Study population and data collection. I collected recordings of songs from male great tits in a wild population in Wytham Woods, Oxfordshire, UK (51460 N, 01200 W) between 1 -30 April during 2017-2019. Great tits in this population are part of a long-term breeding study begun in 1947 for which 1,018 woodcrete nest boxes were placed within the boundaries of the study site (Figure 2). Great tits in this population have been shown to breed almost exclusively in nest boxes (Hinde 1952, Firth and Sheldon 2015). More than 80% of the great tits in this population have been fitted with metal leg rings from the British Trust for Ornithology and an additional leg band with an identifiable passive integrated tag (PIT) produced by IB Technology, Aylesbury, U.K. Within Wytham Woods, great tits spend winters in mixed-species foraging flocks and often begin visiting and claiming breeding territories 4-6 weeks prior to the onset of egg laying every spring (Firth and Sheldon 2015). Individual lifespan ranges from 1 to 9 years, with the majority of birds surviving for one or two breeding seasons (Wilkin 2006). Great tits from nearby woodlands often immigrate into the study area and typically account for approximately one third of the breeding population in a given year (Wilkin 2006, Fayet et al. 2014). This population also experiences high levels of nest predation and offspring mortality; on average, approximately one offspring per breeding individual survives to breed in the following year (McCleery et al. 2004). 120 Figure 2. Map of Wytham Woods. Map areas with grey background represent land within study system boundaries. Axes represent Ordinance Survey National Grid coordinates and axis ticks indicate distance in meters. Points show locations of nest boxes; black points are nest boxes that were included in the study and gray points are nest boxes that were not included. Great tit males typically acquire songs from nearby adults, including but not limited to their social fathers (McGregor and Krebs 1982). The sensitive period for song learning extends into the first breeding season and possibly beyond this point, meaning that males’ repertoires can include songs acquired both before and after dispersal (Franco and Slabbekoorn 2009). During the breeding season, males sing a dawn chorus near their nest box before their female mate emerges each morning. This display is closely synchronized with 121 female fertility and male dawn chorus output reaches its maximum near the onset of female egg laying (Mace 1987). In addition to their ubiquitous two-note songs, which can be categorized into many different songs types, great tits can also use song types that consist of repeated sequences of more than two notes, hereafter referred to as complex songs (Krebs 1976, 1977a; Figure 1). Males frequently produce the same song type multiple times in succession, often cycling through all songs in their repertoires during their dawn chorus display (Lambrechts and Dhont 1986, Naguib et al. 2019, but see Rivera-Gutierrez et al. 2011). I monitored great tits using nest boxes in Marley Plantation, the southeastern most corner of the study site (Figure 2). Field assistants visited nest boxes multiple times per week and identified all nests occupied by great tit pairs by catching birds in order to obtain identifying information from PIT tags. Using data collected from the long-term study, it was possible to reference records of natal nests of all birds in the population to classify males as residents, immigrants, dispersers, or of unknown origin. I defined residents as birds that were born in Wytham Woods or were previously recorded breeding within the system, immigrants as birds that were not born in Wytham and had not bred previously in the woods, and dispersers as birds that were born in Wytham Woods outside of Marley Plantation (the location of this study), and had not previously bred in this region. Bird that were unable to be caught or identified, including untagged birds, were classified as unknown. The sample included a total of 54 birds (birds sampled per year: 2017: N = 21, 2018: N = 16, 2019: N = 17; Table 1). Five birds from 2017 were resampled in 2018, and two birds sampled in 2018 were resampled in 2019. No birds were sampled in all three years. 122 Acoustic data collection and analysis. Acoustic recordings were collected by placing Swift recording units (Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Ithaca, NY) directly below nest boxes occupied by breeding pairs of great tits. Recorders were programmed to record continuously from 0500-1000 daily, and recordings were saved as hour-long WAV files using a 32KHz sampling rate and 16-bit depth. I identified songs from focal males by using only the songs that were recorded at an amplitude threshold of 75 dB (as calculated by the peak power measurement for individual Raven selections) to ensure that songs were produced within close proximity to the nest. Throughout data collection, dawn chorus displays were regularly visually observed by field assistants and recorded using a Sennheiser shotgun microphone (Sennheiser Electronic, Old Lyme, CT) and a Marantz PMD661 digital recorder (Marantz, Mahwah, NJ) to confirm that songs captured by Swift recording units were indeed produced by focal males. Additionally, I compared recordings of songs produced at the same nest on different days to confirm that songs were produced by the same bird, as great tits often exhibit consistent repertoire composition between days (Naguib et al. 2019). For each male included in my sample, I selected three consecutive mornings of recordings to use for acoustic analysis. All samples were collected either during the egg- laying period of a focal male’s mate, or in the three days preceding egg laying. Spectrograms of recordings were created with Raven Pro 1.5 (Bioacoustic Research Program 2015) using a Hann window function with a 512-sample window and 50% overlap. Trained research assistants manually reviewed spectrograms and created Raven selections around all songs in an individual’s dawn chorus display. The dawn chorus was defined as any song produced within 90 mins of civil twilight, as in (Mace 1987). I identified the song types in males’ dawn 123 chorus repertoire using the method described in Chapter 2, and used Raven Pro 1.5 to measure signal-to-noise ratio to determine the highest recording quality for all songs in a focal male’s repertoire, and selected the five best exemplars of each song type from an individual. This resulted in an 865-song dataset. I then used the method described in Chapter 2 to construct an acoustic feature space within which songs were distributed. I also conducted a visual analysis of spectrograms to determine whether song types were composed of two-note phrases or were composed of phrases with more than two notes, i.e. complex songs (Figure 1). Statistical analyses. Several analyses were used to evaluate spatial and temporal variation in songs. First, I calculated the number of unique song types that individuals used, i.e., repertoire size. For every year in the sample, I created distributions of the number of birds using each song type and calculated song type densities (i.e., relative abundance) as the number of birds using a song type divided by the sum of unique song types in that year each multiplied by the number of birds using a song type. I used a Chi-squared test to determine whether the distribution of song type densities differed from a broken stick distribution (MacArthur 1957), which is considered a null distribution and allows for comparison to a previous study of this population (McGregor and Krebs 1982). To test whether the number of birds using a song type in the given year influence the number of birds using that song type the following year, I used a linear mixed model (LMM) with song type presence in the current year as a dependent variable, the number of birds using the song type in the next year as a fixed effect, and current year (2017 or 2018) and song type as random effects. Presence was defined as at least one bird using a song type. I also calculated the Pearson correlation between the number of birds using a song type in the 124 preceding year and the number of birds using that song type in the current year. I did not remove birds that were sampled in consecutive years for either analysis. To evaluate whether neighbors share more songs than expected by chance, I first used Thiessen polygons (Aurenhammer 1991) to calculate territories around all great tit nests in the study area each year, regardless of whether they were included in my sample. This technique has previously been shown to correlate closely with territory boundaries (Wilkin et al. 2007a, Firth and Sheldon 2016). I then classified pairs of birds within the same year as neighbors if they shared a territory boundary, and as non-neighbors otherwise. I used the Dice similarity index (Dice 1945) to calculate pairwise repertoire similarity between all birds within the same year. For every song in a bird’s repertoire, I determined whether other birds used the song type (assigning 0 or 1 for song absence or presence, respectively), and calculated the probability of neighbors and non-neighbors sharing the song type by finding the mean within each group. Comparisons were made only between birds sampled in the same year. I then used two LMMs with pairwise repertoire similarity and probability of sharing a song as dependent variables, identity of the other bird (either neighbor or non-neighbor) as a fixed effect, and reference bird identity as a random effect. To test whether repertoire composition was correlated with nest location, I first developed a metric to evaluate individual acoustic dissimilarity relative to the rest of the population. To calculate this, I found the median distance of all songs in a bird’s repertoire from the centroid of the acoustic feature space. Because an inherent property of the acoustic feature space is that more common songs are in the center of the space and less common songs are farther from the center, this is therefore a proxy for how different a bird sounds from the local population. I used a Pearson correlation to test whether nest distance from the 125 edge of the woods was correlated with acoustic dissimilarity. To further explore the relationship between song repertoire and nest location, I used an analysis of variance (ANOVA) to test whether the number of multi-note songs (i.e., songs with phrases composed of more than two notes) in a bird’s repertoire predicted nest distance from the forest edge, as well as a Pearson correlation to compare repertoire and nest location. Lastly, to test whether repertoire size was correlated with the use of complex songs, I used an LMM with repertoire size as the response variable, presence of complex songs as a fixed effect (binary), and bird identity as a random effect. To determine whether immigration status influenced singing behavior, I used four separate ANOVAs to test whether residents, immigrants, and dispersers differed in acoustic dissimilarity from the local population, nest distance from forest edge, repertoire size, and number of complex songs used. Birds with unknown immigration status were excluded from this analysis. Lastly, I investigated the relationship between repertoire composition and breeding success by using two separate Pearson correlation tests to test whether acoustic dissimilarity was correlated with clutch size or the onset of egg laying. Clutch size was defined as the number of eggs that hatched in a bird’s nest box, and onset of egg laying was calculated as days from April 1 on which the first egg hatched. To avoid sampling birds multiple times, I used records from only the first year in which a bird was included in the sample. All statistical analyses were carried out in R (R Core Team, 2018) using the packages PCDimension (Coombes et al. 2019) and lmerTest (Kuznetsova et al. 2015). 126 RESULTS I found that I found that there were 37 distinct song types used in the study population during 2017-2019 (mean ± SE song per year: 30.66 ± 4.48; see Appendix C for further details). Birds in the sample had repertoire sizes of 3.63 ± 0.32 (mean ± SE). In all years, a small number of song types were used by several birds and many song types were used by only one or two individuals (Figure 3a). The relative abundance of particular song types changed between years, meaning that songs used by many birds in a given year were not necessarily used by many birds in the subsequent year. However, the distribution of relative abundance of all song types used in the population did not change significantly between years and did not differ from a null distribution (Chi-squared test: 2017: c2(32) = 0.011, p = 1, 2018: c2(26) = 0.08, p = 1, 2019: c2(21) = 0.056, p = 1; Figure 3b), and therefore certain song types were not more or less abundant than would be expected by chance. Nine of the 37 songs observed in the population were complex songs that were composed of phrases with more than two notes. 127 Figure 3. Song types used in study system during 2017-2019. The analysis found N=37 unique song types. Plots show a) the number of individuals using each song type per year and b) the proportion of birds using each song type, with song types ordered from most to least abundant within each year. The dashed line in (b) indicates the expected abundance of song types using a null distribution as predicted by a broken stick model. In both 2018 and 2019 novel song types appeared and previously used song types disappeared (Table 1). In total, there were 14 song types that “disappeared” between years and, in all cases, these were previously used by birds that were not resampled in the subsequent year. In 2018 and 2019, there were three song types that appeared in the sample that were not recorded in the previous year, and in two cases these songs were used by previously unsampled birds. Thus, in almost all cases, song turnover between years (i.e., appearance or disappearance of song types) was linked to bird turnover. Although nine of the 48 song types recorded in this population 40 years previously by McGregor and Krebs (1982) persist today, the majority of previously recorded songs (39 song types) were not observed in 128 this study. Among the nine song types that were recorded previously, seven of these were songs composed of two-note phrases and two were complex songs. None of the nine songs that were found in both my study and the earlier study were previously classified as “rare” songs (McGregor and Krebs 1982). Table 1. Summary of birds and song types included in sample each year. The letters R, I, D, and U indicate residents, immigrants, dispersers, and unknown individuals, respectively. Song types Song types Unique Total Year R I D U appeared in current disappeared from song types birds year previous year 2017 33 21 13 4 2 2 - - 2018 27 16 9 1 0 6 1 7 2019 22 17 10 6 1 0 2 7 I did not find that the number of birds using a song type in the previous year was a significant predictor of song presence in the following year, though the trend was in the predicted direction (LMM: t(69.13) = 1.72, N = 72, p = 0.09; Figure 4a). However, the number of birds using a song type in the previous year was positively correlated with the number of birds using that song type in the current year (Pearson correlation: r = 2.44, N = 74, p = 0.017; Figure 4b). Neighbors were significantly more likely to have higher levels of repertoire similarity and share song types than non-neighbors (LMM: repertoire similarity: t = 129 3.1, N = 54, df = 54.91, p = 0.003, song sharing: t = 3.15, N = 320, df = 258.76, p = 0.002; Figure 5). Figure 4. Song carryover between years. a) Song types present in the current year were used by more birds in the previous year, but this difference was not significant (t = 1.72, p = 0.09), b) there is a significant correlation between the number of birds using a song type from one year to the next (r = 2.44, p = 0.017). 130 Figure 5. a) Vocal similarity between neighbors and non-neighbors. Neighbors were significantly more likely to have a) higher pairwise repertoire similarity (t= 3.1, p = 0.003), and b) share song types (t = 3.15, p = 0.002) than non-neighbors. Birds that occupied breeding territories closer to forest edges used songs that were more dissimilar to the rest of the study population, as shown by several metrics. First, birds with nests closer to forest edges were closer to the edge of the acoustic feature space (Pearson correlation: r = -0.31, N = 54, p = 0 .025; Figure 6a). Additionally, birds closer to forest edges had more complex song types in their repertoires (ANOVA: F(1) = 4.87, p = 0.033; Figure 6b) and had larger repertoires (Pearson correlation: r = -2.48, N = 54, p = 0.016; Figure 7a). Birds with larger repertoires were also more likely to use complex song type (t-test: t= 2.55, df = 49.78, p = 0.014; Figure 7b). Only one bird in the sample used two complex song types, 131 and this individual had a repertoire size of six songs types. Plots showing differences in repertoire size among territories are shown in Figure C1. Figure 6. Birds on forest edges sound more dissimilar to other birds in the population. Birds nearer to edges a) had higher acoustic dissimilarity (r = -0.31, p = 0 .025), and b) used more complex song types (F = 4.87, p = 0.033). 132 Figure 7. Repertoire size is correlated with nest location and repertoire composition. Birds with larger repertoires a) had nests closer to forest edges (r = -2.48, p = 0.016), and b) were more likely to use a complex song type (t = 2.55, p = 0.014). Acoustic dissimilarity was not higher among either residents, immigrants, or dispersers (ANOVA: F(2,43) = 0.81, p = 0.45; Figure 8a), nor was any class more likely to breed closer to the forest edge (ANOVA: F(2,43) = 2.05, p = 0.15; Figure 8b). Although immigrants and dispersers tended to have larger repertoires than resident birds, this difference was not significant (ANOVA: F(2,43) = 2.03 p = 0.15; Tukey posthoc tests: residents vs. immigrants: p = 0.15; residents vs. dispersers: p = 0.64; immigrants vs. disperser: p = 0.98; Figure 8c). However, I did find that immigrants were more likely than residents to use complex song types (ANOVA: F(2,43) = 3.49, p = 0.039; Tukey posthoc tests: residents vs. immigrants: p = 0.044; residents vs. dispersers: p = 0.44; immigrants vs. disperser: p = 0.98; 133 Figure 8d). Plots of immigrant, resident, and disperser occupancy of territories in all years are shown in Figure C2. Figure 8. Comparisons between residents, immigrants, and dispersers. Immigration status is not a significant predictor of a) acoustic dissimilarity (F = 0.81, p = 0.45), b) nest distance from forest edge (F = 2.05, p = 0.15), or c) repertoire size (F = 2.03 p = 0.15), but d) immigrants are more likely than residents to use complex song types (p = 0.044). 134 Lastly, I observed that males with higher levels of acoustic dissimilarity had larger clutches (Pearson correlation: r = 2.34, N = 45, p = 0.025; Figure 9a) and mates that began egg laying earlier (Pearson correlation: r = -2.0, N = 46, p = 0.05; Figure 9b). However, when analyzing acoustic similarity versus lay date separately for each year, this was significant in only 2017 (2017: r = -2.36, N = 21, p = 0.029, 2018: r = -2.84, N = 10, p = 0.43; 2019: r = 0- .34, N = 15, p = 0.74). Figure 9. Acoustic dissimilarity may correlate with breeding success. Males that were more acoustically dissimilar from the local population (y-axis) had mates that a) had larger clutches Pearson correlation: r = 2.34, p = 0.025), and b) begin egg laying earlier (r = -2.0, p = 0.05), though when analyzed separately this relationship was only significant in one year of the study (2017: p = 0.03, 2018: p = 0.16, 2019: p = 0.8). 135 DISCUSSION This study set out to evaluate how the songs used within a population vary over time and space and to identify the factors that might underlie this variation. I found that both the presence and abundance of unique song types changed every year, and that this appears to be linked to arrivals and deaths of birds in the study population. In accordance with previous studies of great tits, I found high levels of acoustic similarity between neighboring birds, but also noted that birds breeding near forest edges were more dissimilar to the population than centrally breeding birds. Additionally, although immigrants more often used complex song types, this alone did not explain the tendency for complex songs to be found near forest edges. Lastly, I observed that more acoustically dissimilar birds had larger clutches, but caution against over interpretation of this finding. I consider potential explanations for these results and discuss the implications of these findings below. Relative abundance of song types. I find no evidence of particular song types being more or less common than expected by chance in my study population. This pattern was also reported by a similarly designed study of chaffinches (Slater et al. 1980) as well as an earlier study of great tits conducted in a different region in Wytham Woods (McGregor and Krebs 1982). The observation that song type abundance fits a null distribution suggests that birds to not obtain a selective advantage by using particular songs. However, certain conditions increase the likelihood of song types appearing in a given year, namely the number of birds using that song in the previous year, although even more widely used songs were found to disappear between years (Figure 4). 136 Several factors may help to explain the observed levels of temporal variability in songs. One likely driver of this variability is the demography of the study population. Great tit nests in Wytham Woods experience high levels of predation and offspring mortality, and an average of only one offspring per breeding pair goes on to breed in the population in the subsequent year (McCleery et al. 2004). Additionally, immigrants typically account for approximately one third of the population in the study area (Fayet et al. 2014), and 23% of birds in my sample were immigrants or dispersers. Although great tits can acquire songs after dispersing to breeding territories (Franco and Slabbekoorn 2009), previous work suggests that immigrants more often use rare or more dissimilar songs from local birds (McGregor and Krebs 1982, Fayet et al. 2014). Given the high levels of immigration and individual turnover and the low levels of offspring survival in this system, the likelihood of a meme surviving from one breeding season to the next may be relatively low. It may also be possible that the process of song learning in great tits contributes to the observed levels of temporal variability. Previous work has suggested that songs may be acquired through a process of overproduction and selective attrition, meaning that song types might be present in a population but not observed, as they are not part of a bird’s current functional repertoire (Franco and Slabbekoorn 2009). In this case, the interaction between birds’ repertoires and their current social environment may prevent us from knowing which latent songs are present in the population. However, because songs nearly always appeared and disappeared from my sample with the appearance and disappearance individuals, my data do not support this hypothesis. Although approximately 25% of the song types recorded in this study were also found in this system 40 years previously, the lack of continuous data collection makes it impossible 137 to conclude that these song types have persisted continuously over time. Regardless of whether this fraction of songs has remained stable in the population, the majority of songs used in this system previously have disappeared. This appears to be in contrast to vocal learning species that exhibit stable dialects, such as swamp sparrows (Lachlan et al. 2018) and white-crowned sparrows (Nelson et al. 2004), although methodological differences (e.g., measuring syllables rather than song types) might account for these differences. The songs of several other species have been shown to change over time, particularly when evaluated over several years or decades (e.g., chaffinches, Ince et al. 1980, budgerigars, Farabaugh et al. 1994, sparrows, Kopuchian et al. 2004, and chickadees, Baker and Gammon 2006). Interestingly, all bird in this study used at least one variant of a two-note song, which may be linked to the ubiquity of these song types across the species distribution. It is also possible that there is some positive feedback in the high abundance of these songs, and their use for territory defense (Krebs et al. 1981) that might contribute to their persistence. Additionally, because great tits have been shown to favor songs which best transmit through their environments (Hunter and Krebs 1979), it is possible that songs composed of two-note phrases transmit more effectively in Wytham Woods, and are therefore more widely used. Spatial distribution of song types and the effect of immigrant status. Birds that occupied territories closer to forest edges also had larger repertoires and exhibited higher levels of acoustic dissimilarity. In other words, edge birds used more song types, and often their songs were dissimilar to those of other birds in the population. Previous studies have found that immigrant birds more often nest near forest edges in this system (Wilkin et al. 2007b), and that immigrants often share fewer songs with the local population (McGregor and Krebs 138 1982). Therefore, one might expect that the finding of dissimilar songs being used near forest edges could be explained by immigrants more often breeding on edge territories. However, although immigrant birds more often used complex song types than residents, birds that bred closer to edges had higher acoustic dissimilarity regardless of immigration status (Figure 8). A possible explanation for this is that birds occupying edge territories have fewer neighbors, and thus birds with dissimilar songs may fare better on edges as they maintain territory boundaries with, and therefore may counter sing with, fewer individuals. Why were immigrant birds more likely than residents to use complex song types? A likely explanation is that immigrants acquired these songs in their natal territories, and that birds born in Wytham were not exposed to complex songs during the sensitive period of song learning. It might also be possible that immigrants dispersed from habitats with different acoustic properties in which complex songs transmit more efficiently. Regardless of the drivers of differences in immigrant songs, my results support findings from previous studies suggesting that movement between populations is a driver of local song diversity (Fayet et al. 2014). Past studies have shown that repertoire similarity decreases with distance in great tits (Rivera-Gutierrez et al. 2010a), and my findings of high levels of song sharing between neighbors further corroborate this work. In many species, nearby conspecifics converge upon similar vocalizations, and this may be influenced by several factors including dispersal tendencies, vocal learning process, and territoriality (Slabbekoorn and Smith 2002, Podos and Warren 2007). These drivers of vocal convergence may help to explain why high levels of acoustically similarity persist in this population despite yearly turnover in songs. 139 Repertoire size, composition, and reproductive success. Intriguingly, I found that males with higher levels of acoustic dissimilarity had female mates that began egg laying earlier and produced larger clutches. The correlation between onset of egg laying and clutch size has been shown previously in this population (Perrins and McCleery 1989), though it is unclear whether using dissimilar songs enables males to attract mates and claim territories earlier in the season. If using dissimilar songs indeed enabled such an advantage, acoustically dissimilar males might be expected to begin breeding sooner. However, these results must be interpreted with caution for several reasons. First, female immigration status has been shown to correlate with earlier egg laying (Wilkin et al. 2007b). Additionally, given the role of song sharing in male-male competition, further studies are needed to better understand the fitness consequences of acoustically dissimilarity. For example, in song sparrows, song sharing with neighbors has been shown to correlate with the amount of time a male holds a territory, and song sharing is known to play an important role in intra-specific competition in great tits (McGregor et al. 1992, Falls et al.1982, Peake et al. 2005). Lastly, it is not possible to predict the direction of causality, i.e., whether using dissimilar songs enables birds to claim territories sooner or more easily attract mates, or whether birds of higher quality can learn more different songs because they are healthier or live longer. Conclusions. I found annual changes in the presence and abundance of songs used in the study population, and suggest that this temporal variation can be only partially explained by song usage in the preceding year. I also found high levels of vocal similarity between neighbors and observed that both birds near forest edges and immigrants had higher levels of acoustically dissimilarity. Together, these findings support previous work showing that 140 songbirds do not exhibit a static tendency to use particular song types, and suggest that territoriality, habitat characteristics, and immigration may contribute to spatial and temporal variation in songs. 141 WORKS CITED Alexander, H. G. (1935). A chart of bird song. British Birds, 29, 190-198. Aurenhammer, F. (1991). Voronoi diagrams: a survey of a fundamental geometric data structure. Computing Surveys, 23: 345-405. Baker, M. C. and Cunningham, M. A. (1985). The biology of bird-song dialects. Behavioral and Brain Sciences, 8: 85-100. Baker, M. C., and Gammon, D. E. (2006). Persistence and change of vocal signals in natural populations of chickadees: annual sampling of the gargle call over eight seasons. Behaviour, 1473-1509. Beecher, M. D., Campbell, S. E., and Nordby, J. C. (2000). Territory tenure in song sparrows is related to song sharing with neighbours, but not to repertoire size. Animal behaviour, 59: 29-37. Beecher, M. D., and Brenowitz, E. A. (2005). Functional aspects of song learning in songbirds. Trends in Ecology and Evolution, 20: 143–149. Bioacoustics Research Program. (2011). Raven Pro: interactive sound analysis software. Version 1.5. The Cornell Lab of Ornithology. Ithaca, NY. Bradbury J. W., and Vehrencamp S. L. (2011). Principles of animal communication. Sunderland, MA: Sinauer. Byers, B. E., and Kroodsma, D. E. (2009). Female mate choice and songbird song repertoires. Animal Behaviour, 77: 13-22. Catchpole, C. K., and Slater, P. J. (2008). Bird song: biological themes and variations. Cambridge University Press. Coombes, K. R., Wang, M., and Coombes, M. K. R. (2019). Package PC Dimension. R package version 1.0. Derryberry, E. P. (2009). Ecology shapes birdsong evolution: variation in morphology and habitat explains variation in white-crowned sparrow song. The American Naturalist, 174: 24- 33. Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26: 297-302. Ellers, J., and Slabbekoorn, H. (2003). Song divergence and male dispersal among bird populations: a spatially explicit model testing the role of vocal learning. Animal Behaviour, 65: 671-681. 142 Falls, J. B., Krebs, J. R., and McGregor, P. K. (1982). Song matching in the great tit (Parus major): the effect of similarity and familiarity. Animal Behaviour, 30: 997-1009. Farabaugh, S. M., Linzenbold, A., and Dooling, R. J. (1994). Vocal plasticity in Budgerigars (Melopsittacus undulatus): evidence for social factors in the learning of contact calls. Journal of Comparative Psychology, 108: 81. Fayet, A. L., Tobias, J. A., Hintzen, R. E., and Seddon, N. (2014). Immigration and dispersal are key determinants of cultural diversity in a songbird population. Behavioral ecology, 25: 744-753. Feekes, F. (1977). Colony-specific song in Cacicus cela (Icteridae, Aves): The password hypothesis. Ardea 65: 197–202. Firth, J. A., and Sheldon, B. C. 2015. Experimental manipulation of avian social structure reveals segregation is carried over across contexts. Proceedings of the Royal Society of London. Series B: Biological Sciences, 282: 20142350. Firth, J. A., and Sheldon, B. C. (2016). Social carry-over effects underpin trans-seasonally linked structure in a wild bird population. Ecology letters, 19: 1324-1332. Franco, P., and Slabbekoorn, H. (2009). Repertoire size and composition in great tits: a flexibility test using playbacks. Animal Behaviour, 77: 261-269. Freeman, B. G., and Montgomery, G. A. (2017). Using song playback experiments to measure species recognition between geographically isolated populations: A comparison with acoustic trait analyses. The Auk: Ornithological Advances, 134: 857-870. Gil, D., and Gahr, M. (2002). The honesty of bird song: multiple constraints for multiple traits. Trends in Ecology and Evolution, 17: 133-141. Hunter, M. L., and Krebs, J. R. (1979). Geographical variation in the song of the great tit (Parus major) in relation to ecological factors. The Journal of Animal Ecology, 759-785. Ince, S. A., Slater, P. J. B., and Weismann, C. (1980). Changes with time in the songs of a population of chaffinches. The Condor, 82: 285-290. Kopuchian, C., Lijtmaer, D. A., Tubaro, P. L., and Handford, P. (2004). Temporal stability and change in a microgeographical pattern of song variation in the rufous-collared sparrow. Animal Behaviour, 68: 551-559. Krebs, J. R. (1976). Habituation and song repertoires in the great tit. Behavioral Ecology and Sociobiology, 1: 215-227. 143 Krebs, J. R. (1977). Song and territory in the great tit Parus major. In Evolutionary ecology, pp. 47-62. Macmillan Education UK. Krebs, J. R., and Kroodsma, D. E. (1980). Repertoires and geographical variation in bird song. Advances in the Study of Behavior, 11: 143-177. Krebs, J. R., Ashcroft, R., and Van Orsdol, K. (1981). Song matching in the Great Tit Parus major L. Animal Behaviour, 29: 918-923. Kroodsma, D. E. (1977). Correlates of song organization among North American wrens. The American Naturalist, 995-1008. Kroodsma, D. E., and Byers, B. E. (1991). The function (s) of bird song. American Zoologist, 31: 318-328. Kroodsma, D. E. (2004). The diversity and plasticity of birdsong. Nature’s music: the science of birdsong, pp. 108-131. Elsevier Academic Press: Amsterdam. Kuznetsova, A., Brockhoff, P. B., and Christensen, R. H. B. (2015). Package ‘lmertest’. R package version 2.0. Lachlan, R. F., and Slater, P. J. B. (2003). Song learning by chaffinches: how accurate, and from where?. Animal Behaviour, 65: 957-969. Lachlan, R. F., and Servedio, M. R. (2004). Song learning accelerates allopatric speciation. Evolution, 58: 2049-2063. Lachlan, R. F., Ratmann, O., and Nowicki, S. (2018). Cultural conformity generates extremely stable traditions in bird song. Nature communications, 9: 1-9. Lambrechts, M., and Dhondt, A. A. (1986). Male quality, reproduction, and survival in the great tit (Parus major). Behavioral Ecology and Sociobiology, 19: 57-63. MacArthur, R. H. (1957). On the relative abundance of bird species. Proceedings of the National Academy of Sciences of the United States of America, 43: 293. Macdougall-Shackleton, S. A. (1997). Sexual selection and the evolution of song repertoires. In Current ornithology, pp. 81-124. Springer, Boston, MA. MacDougall-Shackleton, E. A., and MacDougall-Shackleton, S. A. (2001). Cultural and genetic evolution in mountain white-crowned sparrows: song dialects are associated with population structure. Evolution, 55: 2568-2575. Mace, R. (1987). The dawn chorus in the great tit paras major is directly related to female fertility. Nature, 330: 745-746. 144 Marler, P., and Tamura, M. (1962). Song ‘‘dialects’’ in three populations of white-crowned sparrows. Condor, 64: 368–377. Marler, P., and Peters, S. (1982). Developmental overproduction and selective attrition: new processes in the epigenesis of birdsong. Developmental Psychobiology: The Journal of the International Society for Developmental Psychobiology, 15: 369-378. McCleery, R. H., Pettifor, R. A., Armbruster, P., Meyer, K., Sheldon, B. C., and Perrins, C. M. (2004). Components of variance underlying fitness in a natural population of the great tit Parus major. The American Naturalist, 164: E62-E72. McGregor, P. K., Krebs, J. R., and Perrins, C. M. (1981). Song repertoires and lifetime reproductive success in the great tit (Parus major). The American Naturalist, 149-159. McGregor PK, Krebs JR. 1982. Song types in a population of great tits (Parus major): their distribution, abundance and acquisition by individuals. Behaviour, 79:126–152. McGregor, P. K., and Krebs, J. R. (1989). Song learning in adult great tits (Parus major): effects of neighbours. Behaviour, 108: 139-159. McGregor, P. K., Dabelsteen, T., Shepherd, M., and Pedersen, S. B. (1992). The signal value of matched singing in great tits: evidence from interactive playback experiments. Animal Behaviour, 43: 987-998. Morton, E. S. (1975). Ecological sources of selection on avian sounds. The American Naturalist, 109: 17-34. Naguib, M., Diehl, J., Van Oers, K., and Snijders, L. (2019). Repeatability of signalling traits in the avian dawn chorus. Frontiers in zoology, 16: 27. Nelson, D. A. (1992). Song overproduction and selective attrition lead to song sharing in the field sparrow (Spizella pusilla). Behavioral Ecology and Sociobiology, 30: 415-424. Nelson, D. A., and Marler, P. 1994. Selection-based learning in bird song development. Proceedings of the National Academy of Sciences U.S.A, 91: 10498–10501. Nelson, D. A., Hallberg, K. I., and Soha, J. A. (2004). Cultural evolution of Puget sound white-crowned sparrow song dialects. Ethology, 110: 879-908. Nordby, J. C., Campbell, S. E., and Beecher, M. D. (2007). Selective attrition and individual song repertoire development in song sparrows. Animal Behaviour, 74: 1413-1418. Nowicki, S., Peters, S., and Podos, J. (1998). Song learning, early nutrition and sexual selection in songbirds. American Zoologist, 38: 179-190. 145 Payne, R. B. (1981). Song learning and social interaction in indigo buntings. Animal Behaviour, 29: 688-697. Peake, T. M., Matessi, G., McGregor, P. K., and Dabelsteen, T. (2005). Song type matching, song type switching and eavesdropping in male great tits. Animal Behaviour, 69: 1063-1068. Perrins, C. M., and McCleery, R. H. (1989). Laying dates and clutch size in the great tit. The Wilson Bulletin, 236-253. Podos, J., and Warren, P. S. (2007). The evolution of geographic variation in birdsong. Advances in the Study of Behavior, 37: 403-458. Price, T. (2008). Speciation in Birds. Roberts and Company Publishers, Greenwood Village, Colorado. R Core Team (2015). R: A Language and Environment for Statistical Computing. Vienna, Austria. Rivera-Gutierrez, H. F., Matthysen, E., Adriaensen, F., and Slabbekoorn, H. (2010a). Repertoire sharing and song similarity between great tit males decline with distance between forest fragments. Ethology, 116: 951-960. Rivera-Gutierrez, H. F., Pinxten, R., and Eens, M. (2010b). Multiple signals for multiple messages: great tit, Parus major, song signals age and survival. Animal Behaviour, 80: 451- 459. Rivera-Gutierrez, H. F., Pinxten, R., and Eens, M. (2011). Difficulties when assessing birdsong learning programmes under field conditions: a re-evaluation of song repertoire flexibility in the great tit. PloS one, 6: e16003. Searcy, W. A. (1992). Song repertoire and mate choice in birds. American Zoologist, 32: 71- 80. Searcy, W. A., and Andersson, M. (1986). Sexual selection and the evolution of song. Annual Review of Ecology and Systematics, 17: 507-533. Searcy, W. A., and Nowicki, S. (2005). The evolution of animal communication: reliability and deception in signaling systems. Princeton University Press. Slabbekoorn, H., and Smith, T. B. (2002). Bird song, ecology and speciation. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 357: 493-503. Slabbekoorn, H., and Peet, M. (2003). Ecology: Birds sing at a higher pitch in urban noise. Nature, 424: 267-267. 146 Slabbekoorn, H. (2004). Singing in the wild: The ecology of birdsong. In ‘Nature’s Music’. Eds P. Marler and H. Slabbekoorn. Elsevier Academic Press: Amsterdam. Slabbekoorn, H., and den Boer-Visser, A. (2006). Cities change the songs of birds. Current Biology, 16: 2326-2331. Slater, P. J. B. (1989). Bird song learning: Causes and consequences. Ethology, Ecology and Evolution. 1: 19–46. Slater, P. J. B., Ince, S. A., and Colgan, P. W. (1980). Chaffinch song types: their frequencies in the population and distribution between repertoires of different individuals. Behaviour, 207-218. Slater, P. J., and Lachlan, R. F. (2003). Is innovation in bird song adaptive? In: Animal innovation, ed. S. M. Reader & K. N. Laland, pp. 117-36. Oxford University Press. Thomas, A. (2019). RSPB Guide to Birdsong. Bloomsbury Publishing. Vehrencamp, S. L. (2001). Is song–type matching a conventional signal of aggressive intentions? Proceedings of the Royal Society of London. Series B: Biological Sciences, 268: 1637-1642. Wilkin, T. (2006). Environmental effects on great tit life-histories. Doctoral dissertation, University of Oxford. Wilkin, T. A., Perrins, C. M., and Sheldon, B. C. (2007a). The use of GIS in estimating spatial variation in habitat quality: a case study of lay-date in the Great Tit (Parus major). Ibis, 149: 110-118. Wilkin, T. A., Garant, D., Gosler, A. G., and Sheldon, B. C. (2007b). Edge effects in the great tit: analyses of long-term data with GIS techniques. Conservation Biology, 21: 1207-1217. Wright, T. F., Rodriguez, A. M., and Fleischer, R. C. (2005). Vocal dialects, sex-biased dispersal, and microsatellite population structure in the parrot Amazona auropalliata. Molecular Ecology 14: 1197–1205. Wright, T. F., and Dahlin, C. R. (2018). Vocal dialects in parrots: patterns and processes of cultural evolution. Emu-Austral Ornithology, 118: 50-66. Zollinger, S. A., Slater, P. J., Nemeth, E., and Brumm, H. (2017). Higher songs of city birds may not be an individual response to noise. Proceedings of the Royal Society B: Biological Sciences, 284: 20170602. 147 APPENDIX A: SUPPLEMENTARY MATERIALS FOR CHAPTER 1 Social transmission of antipredator behavior. We also used the combined z-transformed values shown in Fig. A1 as the response variable in a linear mixed model that included playback (treatment vs. control) and species as fixed effects and individual identity as a random effect. We observed a significant effect of treatment and no significant difference between species (LMM: playback: F = 6.23, df = 45.06, p = 0.016; species: F = 2.09, df = 15.34, p = 0.17). These tests suggest that perhaps further experiments may show a significant effect social transmission within species, which was not detected in our experiment. Analyses were conducted using the lme4 package (Bates et al. 2015) in R (R Core Team 2018). 148 Figure A1. Histograms of separately z-transformed counts of number of alarm calls and latency to resume foraging for blue tit and great tit observers in five minutes following playbacks. Colours indicate distributions of counts for a) blue tit observers (dark blue) and great tit observers (light blue), b) alarm calls (light blue) and latency (dark blue), c) responses to control playbacks (light blue) and treatment playbacks (dark blue). 149 WORKS CITED Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed- Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48 R Core Team. (2018). R: A language and environment for statistical computing. (R Foundation for Statistical Computing). 150 APPENDIX B: SUPPLEMENT FOR CHAPTER 2 SUPPLEMENTARY METHODS Data synthesis. Allowing for different levels of harmonic content made it possible to simulate recordings with low levels of signal attenuation, such as those collected at close range, as well as recordings with high levels of attenuation, which could be caused by environmental factors such as habitat type, as well as recording conditions. By using simulated data with known classes, we were able to make better predictions about which signal characteristics or recording conditions are likely to affect performance while also avoiding the time-consuming collection of data from live animals. This approach of using synthetic data with known variation and class labels for every signal types is analogous to data augmentation in supervised machine learning. Data augmentation is a process in which labeled training data is slightly altered or modified in order to create additional annotated examples for training an algorithm, and is often employed when labeled data is scarce (Krizhevsky et al. 2012). Data augmentation has been shown to enhance performance of deep learning models in the classification of acoustic data (McFee et al. 2015, Salmon and Bello 2017). This approach may be particularly valuable when developing tools to help bioacoustics researchers in the analysis of field recordings because environmental conditions can alter acoustic structure in distinct ways through scattering, frequency-dependent attenuation and introduction of noise. 151 Figure B1. Spectrograms showing examples of signals in test datasets. a) synthetic budgerigar calls, b) synthetic long-billed hermit songs. Spectrograms in the same row show different synthetic signals that are considered to be the same element type. 152 a) 153 b) 154 Figure B2. Histograms showing durations of a) field-recorded long-billed hermit songs, and b) lab-recorded budgerigar calls. Distributions of durations from live bird recordings were used to create synthetic datasets. 155 SUPPLEMENTARY RESULTS To compare classification rates to those that would be expected by chance, we can calculate random chance of correct assignment as 1/c, where c is the number of different classes. Note that to find statistical significance of observed correct classification rates versus those theoretically expected by chance one must adjust for a finite number of test datapoints (see Combrisson and Jerbi 2015). However, we use this value only as a point of reference for assessing supervised random forest performance. To evaluate the performance of our unsupervised method we use rigorous statistical testing. 156 Table B1. Variable importance rankings indicating which feature measurements were most useful in splitting data into distinct classes were different for each of the four dataset types used for testing. Variable rankings were produced by the separate unsupervised random forest analysis used for each data set. Rankings shown for synthetic data were randomly selected from a random forest model used for synthetic data sets with 100 unique elements. Variable names are listed as they are referred to by the R packages warbleR and seewave, and correspond to the feature measurements listed in the main text. 157 Field- Lab- Variable recorded recorded Synthetic long Synthetic ranking long- billed budgerigar billed hermit budgerigar hermit songs calls songs calls 1 var.cc23 xc.dim.1 min.cc12 max.cc13 2 var.cc16 freq.Q25 median.cc9 mean.cc24 3 var.cc24 median.cc2 kurt.cc21 kurt.cc25 4 var.cc15 freq.Q75 kurt.cc16 var.cc25 5 var.cc22 freq.median var.cc4 var.cc8 6 median.cc4 dtw.dim.1 max.cc23 var.cc4 7 var.cc14 min.cc2 max.cc22 var.cc22 8 var.cc13 max.cc1 median.cc8 skew.cc2 9 var.cc11 median.cc6 mean.cc23 skew.cc22 10 sfm mean.cc5 skew.cc8 kurt.cc20 11 entropy median.cc3 kurt.cc1 kurt.cc23 12 median.cc3 var.cc5 kurt.cc19 kurt.cc21 13 median.cc5 var.cc6 var.cc22 skew.cc20 14 dtw.dim.1 median.cc7 skew.cc7 freq.IQR 15 var.cc9 min.cc6 max.cc21 time.Q25 16 kurt.cc15 max.cc5 min.cc9 maxdom 17 min.cc15 median.cc16 time.median xc.dim.4 18 skew.cc4 xc.dim.2 dtw.dim.3 var.cc6 19 var.cc10 median.cc17 max.cc19 var.cc18 20 mean.cc6 max.cc2 kurt.cc11 var.cc19 21 kurt.cc14 time.ent skew.cc22 var.cc5 22 mean.cc15 time.Q75 kurt.cc2 mean.cc22 23 freq.IQR var.cc9 var.cc3 median.cc15 24 max.cc14 median.cc18 median.cc13 skew.cc19 25 skew.cc15 time.median var.cc8 skew.cc25 26 var.cc25 median.cc15 kurt.cc20 var.cc15 27 max.cc13 kurt.cc6 mean.cc18 max.cc15 28 mean.cc14 median.cc5 var.cc25 max.cc20 29 max.cc16 median.cc8 max.cc24 median.cc13 30 skew.cc14 sfm max.cc8 var.cc23 31 min.cc14 median.cc11 max.cc14 var.cc17 32 max.cc15 median.cc4 kurt.cc17 kurt.cc8 33 var.cc21 xc.dim.4 skew.cc6 var.cc20 34 min.cc10 max.cc3 skew.cc1 var.cc14 35 max.cc11 entropy skew.cc10 mean.cc11 158 36 modindx sd var.cc16 median.cc7 37 skew.cc18 max.cc9 median.cc24 max.cc25 38 min.cc3 time.IQR var.cc23 max.cc11 39 min.cc6 var.cc25 var.cc13 max.cc3 40 median.cc7 skew.cc1 max.cc20 min.cc11 41 skew.cc10 median.cc14 max.cc9 min.cc24 42 skew.cc16 min.cc16 max.cc15 min.cc15 43 kurt.cc16 var.cc1 xc.dim.4 max.cc4 44 max.cc3 median.cc19 min.cc10 max.cc5 45 max.cc5 var.cc24 max.cc3 min.cc19 46 time.ent median.cc10 min.cc15 min.cc25 47 var.cc19 var.cc10 xc.dim.2 min.cc22 48 skew.cc13 var.cc8 sfm min.cc18 49 skew.cc9 skew.cc6 dtw.dim.1 min.cc1 50 time.Q75 var.cc2 min.cc19 xc.dim.2 51 var.cc17 xc.dim.3 sp.ent xc.dim.3 52 time.median min.cc3 mindom min.cc13 53 max.cc19 var.cc12 min.cc8 var.cc10 54 min.cc16 kurt.cc4 max.cc10 mean.cc23 55 var.cc8 var.cc4 median.cc2 kurt.cc19 56 mean.cc10 meanpeakf mean.cc6 skew.cc13 57 max.cc10 min.cc1 var.cc24 var.cc13 58 var.cc18 var.cc3 median.cc22 median.cc19 59 mean.cc9 max.cc15 skew.cc20 median.cc9 60 kurt.cc11 skew.cc3 skew.cc18 median.cc10 61 max.cc23 skew.cc4 var.cc9 max.cc23 62 var.cc6 skew.cc5 skew.cc11 median.cc12 63 max.cc18 var.cc13 skew.cc3 median.cc8 64 kurt.cc13 var.cc7 max.cc25 max.cc16 65 kurt.cc7 skew.cc7 max.cc16 max.cc17 66 kurt.cc4 median.cc13 median.cc5 min.cc12 67 median.cc15 time.Q25 min.cc25 min.cc9 68 min.cc9 max.cc7 min.cc23 min.cc7 69 var.cc12 mean.cc12 min.cc21 min.cc21 70 skew.cc11 var.cc16 min.cc11 min.cc23 71 median.cc14 var.cc15 min.cc4 max.cc1 72 skew.cc3 kurt.cc7 dfrange max.cc2 73 median.cc11 min.cc10 modindx min.cc8 74 var.cc7 xc.dim.5 xc.dim.5 min.cc2 159 75 median.cc13 kurt.cc1 min.cc6 dtw.dim.1 76 min.cc4 dtw.dim.3 dtw.dim.5 dtw.dim.3 77 min.cc24 max.cc17 min.cc2 median.cc21 78 median.cc2 min.cc7 max.cc7 skew.cc10 79 min.cc5 median.cc20 var.cc14 kurt.cc17 80 median.cc18 min.cc11 kurt.cc14 kurt.cc16 81 skew.cc19 min.cc18 mean.d2.cc skew.cc5 82 max.cc8 max.cc8 kurt.cc25 var.cc16 83 skew.cc17 max.cc12 kurt.cc3 mean.cc16 84 xc.dim.2 min.cc5 kurt.cc4 median.cc18 85 mean.cc8 dtw.dim.4 skew.cc17 median.cc6 86 var.cc20 min.cc4 var.cc17 max.cc12 87 skew.cc5 min.cc8 var.cc6 var.cc12 88 min.cc22 median.cc21 mean.cc11 median.cc17 89 skew.cc7 var.cc11 var.cc12 max.cc19 90 var.cc3 skew.cc8 skew.cc14 median.cc3 91 kurt.cc10 median.cc9 skew.cc5 skew.cc6 92 max.cc4 kurt.cc3 skew.cc24 var.cc2 93 min.cc18 skew.cc2 kurt.cc24 median.cc5 94 dtw.dim.2 kurt kurt.cc23 skew.cc4 95 min.cc2 var.cc14 kurt.cc22 max.cc22 96 min.cc12 kurt.cc5 skew.cc23 max.cc18 97 median.cc21 modindx skew.cc21 var.cc1 98 time.Q25 min.cc13 kurt.cc13 max.cc14 99 meanpeakf freq.IQR skew.cc25 var.cc7 100 mindom max.cc6 kurt.cc18 skew.cc3 101 kurt.cc3 dtw.dim.2 kurt.cc9 kurt.cc13 102 median.cc23 dtw.dim.5 kurt.cc15 kurt.cc24 103 min.cc23 var.cc17 kurt.cc7 kurt.cc22 104 var.cc1 max.cc10 kurt.cc5 skew.cc15 105 startdom var.cc23 kurt.cc12 var.cc21 106 kurt.cc23 kurt.cc25 skew.cc15 skew.cc7 107 min.cc7 max.cc11 kurt.cc10 skew.cc12 108 kurt.cc8 kurt.cc8 kurt.cc8 skew.cc11 109 max.cc24 var.cc22 skew.cc2 skew.cc23 110 kurt.cc22 median.cc22 skew.cc12 kurt.cc6 111 kurt.cc9 max.cc19 var.cc11 kurt.cc2 112 max.cc17 max.cc18 mean.cc10 skew.cc21 113 median.cc16 max.cc13 var.cc5 kurt.cc1 160 114 max.cc9 min.cc9 mean.cc4 skew.cc16 115 max.cc21 kurt.cc9 median.cc16 kurt.cc10 116 min.cc13 max.cc16 median.cc12 kurt.cc4 117 var.cc5 max.cc14 median.cc17 kurt.cc15 118 min.cc20 max.cc4 var.cc21 kurt.cc14 119 skew.cc6 min.cc14 var.cc15 kurt.cc5 120 median.cc12 kurt.cc24 var.cc18 skew.cc14 121 var.cc4 var.cc20 var.cc7 skew.cc18 122 dtw.dim.5 min.cc12 mean.cc7 skew.cc24 123 max.cc22 min.cc20 median.cc25 kurt.cc3 124 max.cc1 min.cc15 var.cc2 kurt.cc12 125 max.cc2 max.cc20 mean.cc2 kurt.cc9 126 skew.cc2 var.cc21 mean.cc15 kurt.cc7 127 median.cc24 median.cc23 median.cc21 kurt.cc18 128 skew.cc22 max.cc22 var.cc20 skew.cc17 129 min.cc21 dfslope skew.cc4 skew.cc1 130 skew.cc21 min.cc21 skew.cc9 var.cc9 131 mean.cc25 min.cc19 skew.cc16 mean.cc15 132 var.cc2 var.cc19 kurt.cc6 max.cc24 133 maxdom skew.cc10 skew.cc19 max.cc9 134 xc.dim.4 skew.cc11 skew.cc13 min.cc17 135 kurt var.cc18 var.cc10 min.cc6 136 min.cc17 max.cc21 mean.cc19 meanpeakf 137 max.cc7 kurt.cc12 median.cc14 startdom 138 kurt.cc21 min.cc23 max.cc17 meandom 139 max.cc6 min.cc22 max.cc12 sfm 140 dfrange max.cc24 max.cc1 time.Q75 141 min.cc19 mean.cc24 min.cc14 mindom 142 min.cc11 skew.cc9 meanpeakf time.ent 143 median.cc22 kurt.cc2 kurt kurt 144 dtw.dim.3 max.cc23 freq.IQR time.median 145 median.cc19 min.cc25 freq.Q75 freq.Q25 146 skew.cc20 skew.cc23 duration freq.median 147 median.cc20 min.cc17 sd freq.Q75 148 dfslope kurt.cc13 time.ent xc.dim.1 149 kurt.cc6 maxdom time.Q25 min.cc20 150 skew.cc1 median.cc25 min.cc5 median.cc20 151 xc.dim.1 kurt.cc11 min.cc7 var.cc24 152 max.cc25 skew.cc12 min.cc17 skew.cc8 161 153 kurt.cc24 min.cc24 max.cc5 var.cc11 154 skew.cc12 skew.cc15 min.cc22 median.cc14 155 min.cc8 skew.cc24 max.cc11 median.cc4 156 dtw.dim.4 max.cc25 max.cc4 max.cc21 157 skew.cc8 skew.cc19 max.cc6 max.cc10 158 skew.cc23 kurt.cc23 max.cc13 max.cc8 159 max.cc20 skew.cc16 min.cc20 max.cc7 160 kurt.cc12 skew.cc20 min.cc13 max.cc6 161 kurt.cc18 skew.cc14 min.cc18 min.cc10 162 median.cc17 skew.cc18 startdom min.cc4 163 skew.cc24 skew.cc25 xc.dim.3 xc.dim.5 164 xc.dim.3 enddom dtw.dim.4 dtw.dim.4 165 kurt.cc1 skew.cc17 xc.dim.1 dtw.dim.5 166 kurt.cc17 kurt.cc14 time.Q75 min.cc5 167 kurt.cc5 skew.cc21 maxdom min.cc3 168 kurt.cc19 dfrange enddom dtw.dim.2 169 max.cc12 kurt.cc22 time.IQR dfslope 170 min.cc25 skew.cc13 dfslope modindx 171 xc.dim.5 kurt.cc10 dtw.dim.2 time.IQR 172 kurt.cc2 skew.cc22 min.cc16 enddom 173 skew.cc25 kurt.cc20 max.cc18 min.cc14 174 enddom kurt.cc18 min.cc24 median.cc25 175 kurt.cc20 kurt.cc16 var.cc19 min.cc16 176 kurt.cc25 kurt.cc15 median.cc20 skew.cc9 177 kurt.cc21 kurt.cc11 178 kurt.cc17 var.cc3 179 mindom elm.type 180 kurt.cc19 sd 181 startdom duration 162 WORKS CITED Combrisson, E., & Jerbi, K. (2015). Exceeding chance level by chance: The caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy. Journal of neuroscience methods, 250: 126-136. Dalleau, K., Couceiro, M., & Smaïl-Tabbone, M. (2018). Unsupervised extremely randomized trees. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Cham, 2018. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097-1105. McFee, B., E. Humphrey, and J. Bello. (2015). A software framework for musical data augmentation. In 16th International Society for Music Information Retrieval Conference, pp. 248–254. Salamon, J., & Bello, J. P. (2017). Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 24: 279-283. 163 APPENDIX C: SUPPLEMENTARY MATERIALS FOR CHAPTER 4 The unsupervised algorithm which was used to assign songs to classes (see Chapter 2) found that optimal clustering occurred when using either 24, 37, or 54 classes of songs. In other words, when clustering similar songs together using either 24, 37, or 54 clusters, the algorithm was better able to maximize distance between clusters and minimize distance within clusters than when using other values. Because my analysis found that clustering accuracy was marginally better when using 37 song type clusters, I report the results using this number of song classes. However, it is critical to note that because the analysis also found that either 24 or 54 song classes offered comparable clustering accuracy, it is possible that the “true” number of song types in the study population may not be precisely 37 songs. Rather, this is the best approximation of the number of song types present given the inherent constraints of assigning continuous signals to discrete classes. Ultimately, the correct song type classifications are those that match birds’ perceptions of songs; here, I attempt to estimate those classifications using objective acoustic measurements. Importantly, the results shown in Chapter 4 do not change when using 24, 37, or 54 classes of song types. Thus, the results presented here accurately describe the variation in acoustic signals in the study population, regardless of the reported number of song types. 164 Figure C1. Repertoire size of birds occupying territories in the study area in 2017, 2018, 2019. Territories were calculated using Thiessen polygons. Grey polygons indicate territories of birds that were not sampled. Blue shading in polygons represents the repertoire size of the territory owner. White space surrounding territories is agricultural land that borders the study area. Axes indicate longitude and latitude coordinates used by the Ordinance Survey of Great Britain. 165 a) b) wFigure C2. a) Immigration status of birds occupying territories in the study area in 2017, 2018, 2019. b) Repertoire size of birds occupying territories in the study area in 2017, 2018, 2019. Territories were calculated using Thiessen polygons. Polygon shading represents immigration status, with residents, dispersers, immigrants, and unknown birds indicated by R, D, I, or U, respectively. In a), immigrants were defined as birds that were not born in Wytham and had not previously bred in Wytham (i.e., the definition used for the analysis in this study). In b), immigrants were defined as birds not born in Wytham but that may have previously bred within Wytham. White space surrounding territories is agricultural land that borders the study area. Axes indicate longitude and latitude coordinates used by the Ordinance Survey of Great Britain. 166