1 00:00:09,840 --> 00:00:13,500 okay so thank you for the introduction 2 00:00:11,700 --> 00:00:15,719 Ian thank you Lars for putting this 3 00:00:13,500 --> 00:00:17,220 together I'm looking forward to uh the 4 00:00:15,719 --> 00:00:19,260 conversations that is going to follow my 5 00:00:17,220 --> 00:00:21,900 remarks uh 6 00:00:19,260 --> 00:00:23,460 the title of my remarks is uh should uh 7 00:00:21,900 --> 00:00:25,859 reproducibility be part of the 8 00:00:23,460 --> 00:00:28,080 undergraduate curriculum and I'm going 9 00:00:25,859 --> 00:00:30,060 to answer in the positive of course and 10 00:00:28,080 --> 00:00:32,640 I'm gonna put forward some ideas for 11 00:00:30,060 --> 00:00:33,899 developing some foundational skills in 12 00:00:32,640 --> 00:00:36,180 this area 13 00:00:33,899 --> 00:00:38,640 foreign this work that I'm presenting 14 00:00:36,180 --> 00:00:41,219 today that I'm talking about is work 15 00:00:38,640 --> 00:00:42,960 done in collaboration with Alejandro de 16 00:00:41,219 --> 00:00:45,540 la chisa 17 00:00:42,960 --> 00:00:47,700 from the University of Kentucky I want 18 00:00:45,540 --> 00:00:49,800 to acknowledge his contribution and as I 19 00:00:47,700 --> 00:00:52,800 say that 20 00:00:49,800 --> 00:00:55,800 I want to be clear that these are just 21 00:00:52,800 --> 00:00:57,780 my opinions and do not representing the 22 00:00:55,800 --> 00:00:59,160 opinions or the position of the 23 00:00:57,780 --> 00:01:00,719 developerative Bank of St Louis the 24 00:00:59,160 --> 00:01:05,039 fellow The Market Committee or anybody 25 00:01:00,719 --> 00:01:05,039 across the development system so 26 00:01:11,119 --> 00:01:16,140 stability as part of the undergraduate 27 00:01:13,740 --> 00:01:19,500 curriculum and um 28 00:01:16,140 --> 00:01:22,619 the reason to do this and just to 29 00:01:19,500 --> 00:01:24,960 um cut to the chase that this is a 30 00:01:22,619 --> 00:01:27,900 valuable professional skill it's not an 31 00:01:24,960 --> 00:01:29,580 academic scientific skill necessarily is 32 00:01:27,900 --> 00:01:32,100 a professional skill 33 00:01:29,580 --> 00:01:34,799 we should do the stitching throughout 34 00:01:32,100 --> 00:01:37,560 the curriculum we should not limit it to 35 00:01:34,799 --> 00:01:39,420 specifically our course in statistics or 36 00:01:37,560 --> 00:01:43,380 econometrics 37 00:01:39,420 --> 00:01:45,960 we should start by citing the data 38 00:01:43,380 --> 00:01:47,880 sources every time we make an argument 39 00:01:45,960 --> 00:01:50,880 using data 40 00:01:47,880 --> 00:01:54,479 and we should do it by using fix this by 41 00:01:50,880 --> 00:01:58,820 Leading by example so our students will 42 00:01:54,479 --> 00:01:58,820 do as they see us 43 00:02:00,659 --> 00:02:05,280 to expand a little bit more so our 44 00:02:03,060 --> 00:02:08,360 students understand why we need to cite 45 00:02:05,280 --> 00:02:11,280 the data properly a good data citation 46 00:02:08,360 --> 00:02:13,739 shows the background work and learning 47 00:02:11,280 --> 00:02:14,700 to doing putting a research argument 48 00:02:13,739 --> 00:02:17,160 together 49 00:02:14,700 --> 00:02:20,160 so the research would be much more 50 00:02:17,160 --> 00:02:22,980 thorough and the graduate student will 51 00:02:20,160 --> 00:02:25,620 come across as more reliable 52 00:02:22,980 --> 00:02:28,140 it allows a good presentation allows us 53 00:02:25,620 --> 00:02:31,080 to track the data sets which is the 54 00:02:28,140 --> 00:02:33,620 first step to be able to replicate uh 55 00:02:31,080 --> 00:02:33,620 research 56 00:02:34,680 --> 00:02:38,640 so where does the literature and 57 00:02:36,840 --> 00:02:43,280 economic education and Library science 58 00:02:38,640 --> 00:02:47,220 stand on data citations and 59 00:02:43,280 --> 00:02:50,400 uh data literacy so this is a very 60 00:02:47,220 --> 00:02:52,019 foundational data literacy skill and the 61 00:02:50,400 --> 00:02:54,780 literature and economic education that 62 00:02:52,019 --> 00:02:58,620 puts over examples on teaching and 63 00:02:54,780 --> 00:02:59,360 learning with data always argues or 64 00:02:58,620 --> 00:03:02,819 um 65 00:02:59,360 --> 00:03:05,280 searching for the data naming the data 66 00:03:02,819 --> 00:03:06,599 on Library science The Librarians have 67 00:03:05,280 --> 00:03:10,739 been working on these foundational 68 00:03:06,599 --> 00:03:14,099 skills of citations and references for a 69 00:03:10,739 --> 00:03:17,099 long time so 70 00:03:14,099 --> 00:03:17,879 specifically the next slide will show 71 00:03:17,099 --> 00:03:19,739 you 72 00:03:17,879 --> 00:03:22,140 that 73 00:03:19,739 --> 00:03:25,500 whereas in economic education will be 74 00:03:22,140 --> 00:03:28,080 working out of the arguments of Hansen 75 00:03:25,500 --> 00:03:30,840 to develop expected proficiencies in 76 00:03:28,080 --> 00:03:33,420 economic education among undergraduate 77 00:03:30,840 --> 00:03:37,640 students and as Librarians have 78 00:03:33,420 --> 00:03:41,099 identified seven Standalone competencies 79 00:03:37,640 --> 00:03:43,140 related to working with data in a 80 00:03:41,099 --> 00:03:46,860 business context 81 00:03:43,140 --> 00:03:48,780 since 2019 we have a very solid 82 00:03:46,860 --> 00:03:50,519 connection between those two 83 00:03:48,780 --> 00:03:54,900 signs the statement of the American 84 00:03:50,519 --> 00:03:57,599 economic Association that drives directs 85 00:03:54,900 --> 00:03:59,040 researchers to cite all the sources of 86 00:03:57,599 --> 00:04:02,099 their data 87 00:03:59,040 --> 00:04:04,500 so that's what the theory is 88 00:04:02,099 --> 00:04:08,280 what empirical evidence do we have about 89 00:04:04,500 --> 00:04:12,659 how this is being done in the classroom 90 00:04:08,280 --> 00:04:15,780 so I'm gonna share with you one visual 91 00:04:12,659 --> 00:04:18,540 out of this uh paper on Baseline 92 00:04:15,780 --> 00:04:19,799 competencies and student certificates in 93 00:04:18,540 --> 00:04:24,300 data literacy 94 00:04:19,799 --> 00:04:27,900 so this next slide shows you how 900 95 00:04:24,300 --> 00:04:31,020 students from uh coming from colleges 96 00:04:27,900 --> 00:04:33,900 and universities and 450 high school 97 00:04:31,020 --> 00:04:38,639 students are able to address are able to 98 00:04:33,900 --> 00:04:41,660 answer seven pre-test questions related 99 00:04:38,639 --> 00:04:45,979 to data literacy competencies 100 00:04:41,660 --> 00:04:45,979 defined by the library profession 101 00:04:46,440 --> 00:04:49,460 this is a slide 102 00:04:49,919 --> 00:04:54,540 shows you that high school students and 103 00:04:53,040 --> 00:04:58,380 college students 104 00:04:54,540 --> 00:05:01,139 have exactly the same level of 105 00:04:58,380 --> 00:05:03,840 proficiency when it comes to addressing 106 00:05:01,139 --> 00:05:07,139 data literacy skills but quality 107 00:05:03,840 --> 00:05:10,800 students are way more confident in their 108 00:05:07,139 --> 00:05:13,880 skills than high school students so that 109 00:05:10,800 --> 00:05:13,880 means that we have a lot 110 00:05:24,139 --> 00:05:28,440 digital work 111 00:05:26,660 --> 00:05:31,860 now 112 00:05:28,440 --> 00:05:33,539 that's just a very at a very high level 113 00:05:31,860 --> 00:05:38,039 um my colleague Alejandra and I we've 114 00:05:33,539 --> 00:05:41,100 done some more work uh trying to uh 115 00:05:38,039 --> 00:05:43,380 address or trying to uh survey the 116 00:05:41,100 --> 00:05:46,139 skills that the Baseline skills that the 117 00:05:43,380 --> 00:05:50,039 students have about Revolution and 118 00:05:46,139 --> 00:05:53,280 students name can identify the data you 119 00:05:50,039 --> 00:05:55,380 receive an economic argument can they uh 120 00:05:53,280 --> 00:05:58,979 recognize the sources 121 00:05:55,380 --> 00:06:01,259 can identify what a complete data 122 00:05:58,979 --> 00:06:04,160 citation is 123 00:06:01,259 --> 00:06:06,840 so to do that we 124 00:06:04,160 --> 00:06:09,680 design a short assignment with three 125 00:06:06,840 --> 00:06:12,840 parts we give the students some 126 00:06:09,680 --> 00:06:14,880 foundational instruction on what a data 127 00:06:12,840 --> 00:06:16,259 citation in economics 128 00:06:14,880 --> 00:06:18,960 could look like 129 00:06:16,259 --> 00:06:22,080 we get them to practice those skills 130 00:06:18,960 --> 00:06:24,360 bringing two different short economic 131 00:06:22,080 --> 00:06:27,740 letters and then we gave them an 132 00:06:24,360 --> 00:06:30,840 opportunity to reflect and to identify 133 00:06:27,740 --> 00:06:33,240 references and compare the citations 134 00:06:30,840 --> 00:06:34,680 between essays so what is it that we 135 00:06:33,240 --> 00:06:37,380 found 136 00:06:34,680 --> 00:06:39,900 we use the page ones one of the page 137 00:06:37,380 --> 00:06:41,460 ones that are primers that we produce at 138 00:06:39,900 --> 00:06:42,720 the research submission at the culture 139 00:06:41,460 --> 00:06:46,259 Bank of San Luis 140 00:06:42,720 --> 00:06:49,080 we use those two next slide please two 141 00:06:46,259 --> 00:06:53,280 economic synopsis less than 2 000 words 142 00:06:49,080 --> 00:06:55,199 just a visual a graph equations love 143 00:06:53,280 --> 00:06:56,360 regressions an argument base and another 144 00:06:55,199 --> 00:06:59,340 visualization 145 00:06:56,360 --> 00:07:01,740 and we got we got 146 00:06:59,340 --> 00:07:06,800 uh this is what we found 147 00:07:01,740 --> 00:07:09,720 we started with 854 students with 77 77 148 00:07:06,800 --> 00:07:10,919 so who are willing to participate of 149 00:07:09,720 --> 00:07:13,680 those 150 00:07:10,919 --> 00:07:15,479 um 80 about eight percent were I started 151 00:07:13,680 --> 00:07:18,300 the assignment and 97 of those 152 00:07:15,479 --> 00:07:21,900 completely decided so we got pretty good 153 00:07:18,300 --> 00:07:23,099 uh pretty good participation the makeup 154 00:07:21,900 --> 00:07:25,080 of the experience that we were asking 155 00:07:23,099 --> 00:07:26,520 these are not your introductory students 156 00:07:25,080 --> 00:07:27,599 these are the students that have at 157 00:07:26,520 --> 00:07:29,360 least 158 00:07:27,599 --> 00:07:32,280 um one and a half 159 00:07:29,360 --> 00:07:33,539 economic courses under their belt they 160 00:07:32,280 --> 00:07:35,099 have taken 161 00:07:33,539 --> 00:07:36,840 um they're not novices through 162 00:07:35,099 --> 00:07:38,880 statistics so they have some statistics 163 00:07:36,840 --> 00:07:43,740 courses with them 164 00:07:38,880 --> 00:07:47,360 um above B uh GBA most of them business 165 00:07:43,740 --> 00:07:48,979 majors and your standard 166 00:07:47,360 --> 00:07:51,060 distribution 167 00:07:48,979 --> 00:07:54,240 gender distribution and minority 168 00:07:51,060 --> 00:07:56,639 distribution in terms of their social 169 00:07:54,240 --> 00:07:59,759 demographic profile 170 00:07:56,639 --> 00:08:02,099 so what we found from these students is 171 00:07:59,759 --> 00:08:05,460 that um 172 00:08:02,099 --> 00:08:09,360 about half of them can identify the 173 00:08:05,460 --> 00:08:12,300 series correctly in an economic uh in 174 00:08:09,360 --> 00:08:15,419 those two economic letters 175 00:08:12,300 --> 00:08:17,940 they are very small proportion of them 176 00:08:15,419 --> 00:08:19,800 can identify the sources correctly so 177 00:08:17,940 --> 00:08:22,020 even though they might be aware that the 178 00:08:19,800 --> 00:08:24,120 series have to do with the Consumer 179 00:08:22,020 --> 00:08:27,539 Price Index as one of the essays says 180 00:08:24,120 --> 00:08:29,340 good luck trying to find where that 181 00:08:27,539 --> 00:08:30,360 where those series are even in other 182 00:08:29,340 --> 00:08:33,899 sources 183 00:08:30,360 --> 00:08:34,700 and are very small propulsion are able 184 00:08:33,899 --> 00:08:39,120 to 185 00:08:34,700 --> 00:08:41,880 identify an incomplete citation 186 00:08:39,120 --> 00:08:43,919 um a note you can get a negative score 187 00:08:41,880 --> 00:08:45,540 in this assignment because we calculate 188 00:08:43,919 --> 00:08:47,279 the difference between the number of 189 00:08:45,540 --> 00:08:50,160 correct answers and number of incorrect 190 00:08:47,279 --> 00:08:52,260 answers when calculating the uh this is 191 00:08:50,160 --> 00:08:55,920 called the reported there 192 00:08:52,260 --> 00:08:59,160 we also look into the misconceptions and 193 00:08:55,920 --> 00:09:00,839 mistakes that they made we noticed that 194 00:08:59,160 --> 00:09:01,740 we learned that 195 00:09:00,839 --> 00:09:04,380 um 196 00:09:01,740 --> 00:09:06,600 although very few can identify the 197 00:09:04,380 --> 00:09:10,620 resources excuse me the sources of the 198 00:09:06,600 --> 00:09:12,779 data many of them confuse the source 199 00:09:10,620 --> 00:09:15,480 with the distributor 200 00:09:12,779 --> 00:09:17,160 um which is a problem particularly for 201 00:09:15,480 --> 00:09:19,140 us and the fellowship Bank of San Luis 202 00:09:17,160 --> 00:09:20,700 because we want to be perceived and 203 00:09:19,140 --> 00:09:23,580 we're going to come across as 204 00:09:20,700 --> 00:09:25,560 trustworthy Distributors of data but we 205 00:09:23,580 --> 00:09:27,480 don't want to be seen as the source of 206 00:09:25,560 --> 00:09:29,220 the data 207 00:09:27,480 --> 00:09:29,820 um finally 208 00:09:29,220 --> 00:09:33,000 um 209 00:09:29,820 --> 00:09:35,820 you see there the proportion also less 210 00:09:33,000 --> 00:09:39,060 than 50 percent of the students who 211 00:09:35,820 --> 00:09:41,100 consider the presentations were complete 212 00:09:39,060 --> 00:09:43,500 so where does this thing put us where 213 00:09:41,100 --> 00:09:45,899 does this thing take us 214 00:09:43,500 --> 00:09:47,459 so the implications for instructional 215 00:09:45,899 --> 00:09:50,160 curriculum 216 00:09:47,459 --> 00:09:52,080 um are the following uh enroll the help 217 00:09:50,160 --> 00:09:54,300 of the Librarians they've been working 218 00:09:52,080 --> 00:09:56,339 on this space for a long time they're 219 00:09:54,300 --> 00:09:59,760 your allies 220 00:09:56,339 --> 00:10:02,880 um be consistent when naming the sources 221 00:09:59,760 --> 00:10:05,580 of the data that that you use 222 00:10:02,880 --> 00:10:08,700 and embed that practice in all your 223 00:10:05,580 --> 00:10:11,339 teaching this is not a micro skill or a 224 00:10:08,700 --> 00:10:13,320 micro skill or econometric skill every 225 00:10:11,339 --> 00:10:16,019 time we make an argument every time we 226 00:10:13,320 --> 00:10:19,500 show a graph every time we use data we 227 00:10:16,019 --> 00:10:22,620 named data we name the source and we 228 00:10:19,500 --> 00:10:25,320 should be leading by example 229 00:10:22,620 --> 00:10:28,380 last but not least I just want to remind 230 00:10:25,320 --> 00:10:30,540 you all of you in the room and online 231 00:10:28,380 --> 00:10:33,540 that signing the data is a foundational 232 00:10:30,540 --> 00:10:34,920 skill and it can be practiced and it 233 00:10:33,540 --> 00:10:36,959 should be practiced across the 234 00:10:34,920 --> 00:10:39,360 curriculum 235 00:10:36,959 --> 00:10:41,459 that's the end 236 00:10:39,360 --> 00:10:43,440 let me actually take a prerogative here 237 00:10:41,459 --> 00:10:47,100 can I get the training on data citations 238 00:10:43,440 --> 00:10:49,800 as a self-guided module for senior 239 00:10:47,100 --> 00:10:52,740 authors of AAA artists 240 00:10:49,800 --> 00:10:54,360 because I don't actually think that the 241 00:10:52,740 --> 00:10:57,140 proportions getting it right are going 242 00:10:54,360 --> 00:10:57,140 to be drastically different 243 00:11:00,320 --> 00:11:04,920 I mean that's I think the key part of 244 00:11:03,300 --> 00:11:08,300 live that curriculum 245 00:11:04,920 --> 00:11:11,700 um is that it's not intrinsic to how 246 00:11:08,300 --> 00:11:13,019 economics has been taught in the past 247 00:11:11,700 --> 00:11:15,240 um probably not how it's being taught 248 00:11:13,019 --> 00:11:16,500 now and that's what we're conveying to 249 00:11:15,240 --> 00:11:18,959 students so 250 00:11:16,500 --> 00:11:21,420 um my guess is that that's pretty much 251 00:11:18,959 --> 00:11:23,820 what we get at the EAA as well 252 00:11:21,420 --> 00:11:25,680 two weeks ago we had a workshop at about 253 00:11:23,820 --> 00:11:28,500 30 journalists 254 00:11:25,680 --> 00:11:31,740 and we were showing them how to spread 255 00:11:28,500 --> 00:11:33,240 to tell the stories behind the data and 256 00:11:31,740 --> 00:11:35,040 a lot of the journalists ask me you know 257 00:11:33,240 --> 00:11:37,459 what's the biggest piece that you have 258 00:11:35,040 --> 00:11:40,320 right you can get asked to do one thing 259 00:11:37,459 --> 00:11:41,579 uh what's the biggest thing to fix and I 260 00:11:40,320 --> 00:11:44,399 suggest please 261 00:11:41,579 --> 00:11:47,220 give credit to the sources as they 262 00:11:44,399 --> 00:11:50,279 deserve it so if you find a photograph 263 00:11:47,220 --> 00:11:52,320 congratulations but spend some time 264 00:11:50,279 --> 00:11:54,779 bringing people out below the canvas 265 00:11:52,320 --> 00:11:57,480 when it says source and name the source 266 00:11:54,779 --> 00:11:59,519 friendly Source uh you're not helping 267 00:11:57,480 --> 00:12:03,140 your readers if you're not telling them 268 00:11:59,519 --> 00:12:03,140 what agents explore with that 269 00:12:04,800 --> 00:12:09,360 same questions from the audience 270 00:12:07,920 --> 00:12:10,200 might have time at the end to discuss 271 00:12:09,360 --> 00:12:12,720 too 272 00:12:10,200 --> 00:12:14,459 okay so Lars thanks very much for 273 00:12:12,720 --> 00:12:15,720 organizing this I'm very happy to be 274 00:12:14,459 --> 00:12:18,480 here and uh 275 00:12:15,720 --> 00:12:21,380 hear other people's views can share mine 276 00:12:18,480 --> 00:12:21,380 and hope it leads to more to 277 00:12:21,480 --> 00:12:25,860 more uh so the title of this whole panel 278 00:12:24,180 --> 00:12:27,060 is your teaching reproducibility to be a 279 00:12:25,860 --> 00:12:28,920 part of undergraduate education 280 00:12:27,060 --> 00:12:32,399 curriculum and not give you a spoiler 281 00:12:28,920 --> 00:12:35,399 alert I think the answer is yes and 282 00:12:32,399 --> 00:12:40,019 uh I'll tell you why and and have a few 283 00:12:35,399 --> 00:12:42,720 perspectives on it so so in fact 284 00:12:40,019 --> 00:12:44,760 I'm going to build this up but the punch 285 00:12:42,720 --> 00:12:46,920 line is going to be that really 286 00:12:44,760 --> 00:12:49,019 it shouldn't just be part of the 287 00:12:46,920 --> 00:12:51,240 curriculum it should be thoroughly 288 00:12:49,019 --> 00:12:53,160 integrated from the beginning and 289 00:12:51,240 --> 00:12:55,800 through the middle and to the very end 290 00:12:53,160 --> 00:12:57,660 that basically every time students are 291 00:12:55,800 --> 00:12:59,339 working with statistical data they 292 00:12:57,660 --> 00:13:01,320 should be making documentation for what 293 00:12:59,339 --> 00:13:04,260 they do and turning in with their 294 00:13:01,320 --> 00:13:06,360 projects so if it's just uh in your very 295 00:13:04,260 --> 00:13:07,079 first intro stats class where you have 296 00:13:06,360 --> 00:13:10,560 to 297 00:13:07,079 --> 00:13:13,019 do a t-test to see if mean compensation 298 00:13:10,560 --> 00:13:15,060 for women is the same as men 299 00:13:13,019 --> 00:13:16,920 so what do people do now they go to the 300 00:13:15,060 --> 00:13:19,139 computer they type t-tests men versus 301 00:13:16,920 --> 00:13:20,700 women to get some results maybe they 302 00:13:19,139 --> 00:13:23,700 copy and paste them put them in their 303 00:13:20,700 --> 00:13:26,459 paper print it out and turn them in 304 00:13:23,700 --> 00:13:27,839 and what I would Advocate is doing all 305 00:13:26,459 --> 00:13:31,380 of those things 306 00:13:27,839 --> 00:13:33,480 and writing a little script that says 307 00:13:31,380 --> 00:13:35,040 open the data file 308 00:13:33,480 --> 00:13:37,579 do a t-test 309 00:13:35,040 --> 00:13:40,680 possibly save the output from ET tests 310 00:13:37,579 --> 00:13:43,079 and then that do file gets turned in 311 00:13:40,680 --> 00:13:45,420 electronically at the same time that a 312 00:13:43,079 --> 00:13:47,760 piece of paper gets turned in 313 00:13:45,420 --> 00:13:50,459 and if it's a slightly more involved 314 00:13:47,760 --> 00:13:53,639 kind of project then 315 00:13:50,459 --> 00:13:56,040 then you know maybe at some point you 316 00:13:53,639 --> 00:13:57,180 want to say hey put that script in one 317 00:13:56,040 --> 00:13:59,220 folder and put the date in a different 318 00:13:57,180 --> 00:14:00,060 folder and maybe have a photo for your 319 00:13:59,220 --> 00:14:02,399 output 320 00:14:00,060 --> 00:14:03,600 and gradually as assignments get more 321 00:14:02,399 --> 00:14:05,459 complex 322 00:14:03,600 --> 00:14:07,260 uh structure of what students have to 323 00:14:05,459 --> 00:14:09,480 create and turning it gets more complex 324 00:14:07,260 --> 00:14:12,000 as people start doing research they've 325 00:14:09,480 --> 00:14:14,399 got to start putting data citations have 326 00:14:12,000 --> 00:14:16,019 to have some kind of guide to data 327 00:14:14,399 --> 00:14:17,160 sources it gives all those data 328 00:14:16,019 --> 00:14:20,639 citations 329 00:14:17,160 --> 00:14:23,700 a nice readme file but all this could 330 00:14:20,639 --> 00:14:25,079 like be built up piece by piece as uh as 331 00:14:23,700 --> 00:14:27,959 students go along through the curriculum 332 00:14:25,079 --> 00:14:30,180 and and so 333 00:14:27,959 --> 00:14:31,680 it shouldn't be a special topic 334 00:14:30,180 --> 00:14:33,660 should be like a special half credit 335 00:14:31,680 --> 00:14:37,680 course on reproducibility 336 00:14:33,660 --> 00:14:39,899 uh and it should just be the routine way 337 00:14:37,680 --> 00:14:41,579 things are done so that someday people 338 00:14:39,899 --> 00:14:43,680 don't have to talk about it just like 339 00:14:41,579 --> 00:14:45,180 nobody talks about putting bibliography 340 00:14:43,680 --> 00:14:47,399 at the end of the page 341 00:14:45,180 --> 00:14:49,079 uh uh they should do this without 342 00:14:47,399 --> 00:14:51,959 without talking about it so that that's 343 00:14:49,079 --> 00:14:54,360 my view and and in economics and across 344 00:14:51,959 --> 00:14:57,660 the social sciences and I'll comment 345 00:14:54,360 --> 00:15:00,120 about undergraduate curriculum uh 346 00:14:57,660 --> 00:15:02,399 I I think that in getting this stuff 347 00:15:00,120 --> 00:15:04,560 into an undergraduate curriculum 348 00:15:02,399 --> 00:15:06,779 economics has been slower than some 349 00:15:04,560 --> 00:15:09,899 other fields I think maybe possibly 350 00:15:06,779 --> 00:15:12,540 political science has been most uh 351 00:15:09,899 --> 00:15:14,940 Progressive about this I think economics 352 00:15:12,540 --> 00:15:16,860 has everybody 353 00:15:14,940 --> 00:15:19,560 economics has got the gold standard for 354 00:15:16,860 --> 00:15:21,660 Journal editing and and documenting 355 00:15:19,560 --> 00:15:23,279 professional studies but I think in the 356 00:15:21,660 --> 00:15:24,839 curricular area 357 00:15:23,279 --> 00:15:28,260 uh we have a little bit of catching up 358 00:15:24,839 --> 00:15:30,420 to do and but but really even across 359 00:15:28,260 --> 00:15:32,100 disciplines it's still quite a small 360 00:15:30,420 --> 00:15:34,079 fraction but you're doing this in a 361 00:15:32,100 --> 00:15:36,060 careful way 362 00:15:34,079 --> 00:15:37,920 Okay so 363 00:15:36,060 --> 00:15:39,600 so I want to tell you a bit like how I 364 00:15:37,920 --> 00:15:42,360 why do I think it should be so 365 00:15:39,600 --> 00:15:43,860 ubiquitous what's the purpose of it and 366 00:15:42,360 --> 00:15:45,959 doesn't really make sense in principle 367 00:15:43,860 --> 00:15:47,519 and then the last bit about how can we 368 00:15:45,959 --> 00:15:50,480 get there 369 00:15:47,519 --> 00:15:54,240 I don't really know I've got some ideas 370 00:15:50,480 --> 00:15:56,699 and it it's going to take 371 00:15:54,240 --> 00:15:59,940 collaboration among a lot of 372 00:15:56,699 --> 00:16:02,040 stakeholders so to speak uh but I think 373 00:15:59,940 --> 00:16:05,699 we're at a moment where that could 374 00:16:02,040 --> 00:16:07,980 happen effectively uh but so that's kind 375 00:16:05,699 --> 00:16:10,079 of speculative about the future 376 00:16:07,980 --> 00:16:12,060 so 377 00:16:10,079 --> 00:16:14,459 so why why do I think it's such a good 378 00:16:12,060 --> 00:16:16,980 idea it's based on my own experience so 379 00:16:14,459 --> 00:16:20,100 I'm not a data scientist or a computer 380 00:16:16,980 --> 00:16:21,839 guy or anything I was just 381 00:16:20,100 --> 00:16:24,480 teaching and introductory statistics 382 00:16:21,839 --> 00:16:28,199 course for economics majors in which 383 00:16:24,480 --> 00:16:29,880 they wrote research papers so working 384 00:16:28,199 --> 00:16:32,279 little teams for the whole semester they 385 00:16:29,880 --> 00:16:35,160 choose topic reduce literature find some 386 00:16:32,279 --> 00:16:38,160 data from some real source and then do 387 00:16:35,160 --> 00:16:39,839 internal stats course so simple mainly 388 00:16:38,160 --> 00:16:42,420 just descriptive statistics bar graphs 389 00:16:39,839 --> 00:16:45,839 and stuff to try to get some idea about 390 00:16:42,420 --> 00:16:47,579 some issue big shows and for my first 391 00:16:45,839 --> 00:16:50,639 few years of doing this 392 00:16:47,579 --> 00:16:53,100 it was a great idea but I really could 393 00:16:50,639 --> 00:16:55,019 not understand what they were saying in 394 00:16:53,100 --> 00:16:56,759 their papers what they were saying about 395 00:16:55,019 --> 00:16:59,339 what they did with their data they would 396 00:16:56,759 --> 00:17:00,540 talk about merging these data sets and 397 00:16:59,339 --> 00:17:02,699 there were 398 00:17:00,540 --> 00:17:04,799 no possible variables that could have 399 00:17:02,699 --> 00:17:07,740 blinked in any way between the data sets 400 00:17:04,799 --> 00:17:09,360 and and worst part was when I would ask 401 00:17:07,740 --> 00:17:12,839 them about it 402 00:17:09,360 --> 00:17:14,400 they did not know and why is that they 403 00:17:12,839 --> 00:17:17,040 were so they were they were all using 404 00:17:14,400 --> 00:17:20,699 stata and they were Titan commands but 405 00:17:17,040 --> 00:17:24,299 just interactively and and so one day 406 00:17:20,699 --> 00:17:26,459 walking across campus I said myself ah 407 00:17:24,299 --> 00:17:28,439 problem solved have in front of their 408 00:17:26,459 --> 00:17:30,059 data files and some do files they're two 409 00:17:28,439 --> 00:17:31,500 files and then 410 00:17:30,059 --> 00:17:33,840 problem solved I'll just see what they 411 00:17:31,500 --> 00:17:36,840 did and that did not solve the problem 412 00:17:33,840 --> 00:17:39,960 because what they turned in was bits and 413 00:17:36,840 --> 00:17:42,539 random scraps of bits of do files that 414 00:17:39,960 --> 00:17:45,240 didn't run and the data wasn't there and 415 00:17:42,539 --> 00:17:48,780 so slowly over several years 416 00:17:45,240 --> 00:17:51,780 uh I figured out what kind of structure 417 00:17:48,780 --> 00:17:53,760 they needed to be able to put together a 418 00:17:51,780 --> 00:17:57,600 whole package that had everything it 419 00:17:53,760 --> 00:18:00,179 needed but everything would run and 420 00:17:57,600 --> 00:18:02,840 and 421 00:18:00,179 --> 00:18:05,520 after five or six years of experimenting 422 00:18:02,840 --> 00:18:09,600 students now do this 423 00:18:05,520 --> 00:18:11,580 with close to 100 success rate like 424 00:18:09,600 --> 00:18:14,340 their issues with the papers they turn 425 00:18:11,580 --> 00:18:15,660 in but I can I can run them and see what 426 00:18:14,340 --> 00:18:16,980 they did 427 00:18:15,660 --> 00:18:19,100 and 428 00:18:16,980 --> 00:18:19,100 and 429 00:18:19,320 --> 00:18:23,640 let me tell you what what the benefits 430 00:18:21,600 --> 00:18:25,980 came from that is because so the 431 00:18:23,640 --> 00:18:27,860 immediate the immediate goal was that I 432 00:18:25,980 --> 00:18:31,200 could understand what they did 433 00:18:27,860 --> 00:18:32,940 and that works because now they turn in 434 00:18:31,200 --> 00:18:34,860 stuff like that 435 00:18:32,940 --> 00:18:37,799 I can see step by step exactly what they 436 00:18:34,860 --> 00:18:39,960 did I understand it and in fact 437 00:18:37,799 --> 00:18:41,460 they understand it better they can talk 438 00:18:39,960 --> 00:18:42,780 much more intelligently they can 439 00:18:41,460 --> 00:18:44,760 interpret their results much more 440 00:18:42,780 --> 00:18:47,400 intelligently because 441 00:18:44,760 --> 00:18:49,559 they know they know what they did 442 00:18:47,400 --> 00:18:51,360 uh 443 00:18:49,559 --> 00:18:52,860 the middle point about dramatic 444 00:18:51,360 --> 00:18:55,980 enhancement ability to advise and 445 00:18:52,860 --> 00:18:57,840 evaluate student projects so a key key 446 00:18:55,980 --> 00:18:58,700 thing is that they keep all their stuff 447 00:18:57,840 --> 00:19:02,340 up 448 00:18:58,700 --> 00:19:04,799 on a file sharing platform and you can 449 00:19:02,340 --> 00:19:07,799 use the one you like GitHub or OSF or 450 00:19:04,799 --> 00:19:10,080 Google drive or box whatever and when 451 00:19:07,799 --> 00:19:11,820 they come to see me first thing I do is 452 00:19:10,080 --> 00:19:12,900 I go and I download all their stuff onto 453 00:19:11,820 --> 00:19:15,600 my computer 454 00:19:12,900 --> 00:19:17,460 and if they have some question about why 455 00:19:15,600 --> 00:19:19,980 some regression dropping some variable I 456 00:19:17,460 --> 00:19:21,900 won't look at it until I've opened their 457 00:19:19,980 --> 00:19:24,600 first new file and work through it bit 458 00:19:21,900 --> 00:19:26,340 by bit and see exactly what they've done 459 00:19:24,600 --> 00:19:27,660 then we get to the problem and the fact 460 00:19:26,340 --> 00:19:30,179 is we never get to the problem they're 461 00:19:27,660 --> 00:19:32,039 asking about because something comes up 462 00:19:30,179 --> 00:19:35,940 before we straighten that out 463 00:19:32,039 --> 00:19:38,400 and then the later problem solved so 464 00:19:35,940 --> 00:19:40,260 when I look back to what I used to do 465 00:19:38,400 --> 00:19:41,640 before that when they're just print out 466 00:19:40,260 --> 00:19:44,600 a regression where some variable was 467 00:19:41,640 --> 00:19:44,600 dropped and asked me what happened 468 00:19:45,260 --> 00:19:50,039 I would try to 469 00:19:48,360 --> 00:19:52,080 think what can I do to get this student 470 00:19:50,039 --> 00:19:55,380 to leave my office without looking 471 00:19:52,080 --> 00:19:56,880 really flagrantly negligent and uh so 472 00:19:55,380 --> 00:19:57,900 this this is an improvement over that 473 00:19:56,880 --> 00:20:01,620 situation 474 00:19:57,900 --> 00:20:04,080 uh and students catch on 475 00:20:01,620 --> 00:20:07,740 quickly and uh 476 00:20:04,080 --> 00:20:10,799 so that kind of communication is really 477 00:20:07,740 --> 00:20:13,020 really the big thing and 478 00:20:10,799 --> 00:20:14,880 and the bottom thing is like it 479 00:20:13,020 --> 00:20:17,100 reinforces core lessons about 480 00:20:14,880 --> 00:20:19,559 intellectual Integrity that particularly 481 00:20:17,100 --> 00:20:22,440 for undergraduate education I think are 482 00:20:19,559 --> 00:20:24,120 a big part of what it's about so just 483 00:20:22,440 --> 00:20:25,860 learning that 484 00:20:24,120 --> 00:20:29,280 I can't just 485 00:20:25,860 --> 00:20:31,380 say something and justify it by waving 486 00:20:29,280 --> 00:20:34,140 my arms I've got to make a careful 487 00:20:31,380 --> 00:20:36,000 argument based on evidence and carefully 488 00:20:34,140 --> 00:20:38,880 looking at my evidence and understand 489 00:20:36,000 --> 00:20:42,000 how all the pieces really fit together 490 00:20:38,880 --> 00:20:44,280 uh and that's what can make some kind of 491 00:20:42,000 --> 00:20:46,020 claim so 492 00:20:44,280 --> 00:20:47,580 you know in the old days you know 493 00:20:46,020 --> 00:20:48,900 frankly 494 00:20:47,580 --> 00:20:50,520 I didn't understand what they were 495 00:20:48,900 --> 00:20:52,260 writing they knew I didn't understand 496 00:20:50,520 --> 00:20:55,020 what they were writing 497 00:20:52,260 --> 00:20:56,820 and they'd all get the pluses and they 498 00:20:55,020 --> 00:21:01,620 minuses and life went on but that's like 499 00:20:56,820 --> 00:21:04,679 a bad message to send them so so I think 500 00:21:01,620 --> 00:21:06,480 these are like pedagogical benefits that 501 00:21:04,679 --> 00:21:09,660 I think are special to undergraduate 502 00:21:06,480 --> 00:21:11,160 education uh that I think get less 503 00:21:09,660 --> 00:21:12,360 attention but if we're talking about 504 00:21:11,160 --> 00:21:14,400 undergraduate education I think they're 505 00:21:12,360 --> 00:21:17,700 actually really fundamental they go to 506 00:21:14,400 --> 00:21:19,980 the next slide so these are also very 507 00:21:17,700 --> 00:21:21,419 professional very important things I 508 00:21:19,980 --> 00:21:23,400 think these are the things that are more 509 00:21:21,419 --> 00:21:25,500 commonly recognized that 510 00:21:23,400 --> 00:21:27,120 you know work and reproducibility that's 511 00:21:25,500 --> 00:21:28,620 a necessary condition for doing credible 512 00:21:27,120 --> 00:21:31,320 research so if you're teaching research 513 00:21:28,620 --> 00:21:34,740 skills you better teach them to do it uh 514 00:21:31,320 --> 00:21:37,380 and relatedly it's becoming an essential 515 00:21:34,740 --> 00:21:39,480 professional skill among researchers 516 00:21:37,380 --> 00:21:41,220 and even for students who don't want to 517 00:21:39,480 --> 00:21:43,919 go into research careers they're going 518 00:21:41,220 --> 00:21:45,659 to go to Ras or analysts when they 519 00:21:43,919 --> 00:21:47,280 graduate from college these kinds of 520 00:21:45,659 --> 00:21:51,059 skills are really really helpful to them 521 00:21:47,280 --> 00:21:52,679 so so those are benefits so 522 00:21:51,059 --> 00:21:56,520 so 523 00:21:52,679 --> 00:21:58,740 it was in 2013 so I I was doing all this 524 00:21:56,520 --> 00:22:02,700 closely I collaborated with my colleague 525 00:21:58,740 --> 00:22:04,919 Norm Maderos at Haverford and in 2013 we 526 00:22:02,700 --> 00:22:06,659 started calling a project Here We 527 00:22:04,919 --> 00:22:08,100 Believe faculty development workshops 528 00:22:06,659 --> 00:22:09,000 and those are the main things we've been 529 00:22:08,100 --> 00:22:11,780 doing 530 00:22:09,000 --> 00:22:11,780 and 531 00:22:12,960 --> 00:22:17,220 and 532 00:22:15,059 --> 00:22:20,059 we have had a lot of success of a 533 00:22:17,220 --> 00:22:22,980 certain kind we've had a lot of success 534 00:22:20,059 --> 00:22:25,500 at connecting with individual faculty 535 00:22:22,980 --> 00:22:27,780 members and helping them figure out ways 536 00:22:25,500 --> 00:22:30,419 to do things in their classes and 537 00:22:27,780 --> 00:22:33,320 incorporate this reproducibility I mean 538 00:22:30,419 --> 00:22:33,320 sort of our 539 00:22:34,220 --> 00:22:38,580 sort of Paradigm way of working with 540 00:22:36,600 --> 00:22:40,080 people is the company's workshops 541 00:22:38,580 --> 00:22:42,179 they've been teaching some quantitative 542 00:22:40,080 --> 00:22:44,340 methods course have various assignments 543 00:22:42,179 --> 00:22:47,159 but it's all done point-and-click or 544 00:22:44,340 --> 00:22:48,419 just interactively they know they should 545 00:22:47,159 --> 00:22:50,820 do something different but don't quite 546 00:22:48,419 --> 00:22:55,020 know what to do and this just helps them 547 00:22:50,820 --> 00:22:56,580 get over that hurdle and and I 548 00:22:55,020 --> 00:22:59,220 so 549 00:22:56,580 --> 00:23:03,120 and and that's been our main 550 00:22:59,220 --> 00:23:05,340 say target audience as instructors and 551 00:23:03,120 --> 00:23:08,039 if we count up we'd probably find 552 00:23:05,340 --> 00:23:10,799 something in the hundreds of instructors 553 00:23:08,039 --> 00:23:13,880 who have started doing this which for me 554 00:23:10,799 --> 00:23:13,880 and Norm just kind of 555 00:23:14,039 --> 00:23:19,020 putting out 556 00:23:16,140 --> 00:23:20,760 messages saying come to this Workshop uh 557 00:23:19,020 --> 00:23:22,200 we're fairly happy about that 558 00:23:20,760 --> 00:23:26,039 but now I've been doing this for 10 559 00:23:22,200 --> 00:23:28,460 years and it's working out and we've 560 00:23:26,039 --> 00:23:31,620 learned a lot about it 561 00:23:28,460 --> 00:23:33,960 and so we're now really at an inflection 562 00:23:31,620 --> 00:23:37,260 point where we think we should 563 00:23:33,960 --> 00:23:41,280 step it up a bit and he 564 00:23:37,260 --> 00:23:44,640 and a big piece of what makes us think 565 00:23:41,280 --> 00:23:47,520 this should be broadcast more widely is 566 00:23:44,640 --> 00:23:49,919 we've discovered like how flexible this 567 00:23:47,520 --> 00:23:52,260 whole idea is so if you look at the tier 568 00:23:49,919 --> 00:23:54,240 protocol online it's very positive says 569 00:23:52,260 --> 00:23:56,520 put these folders exactly here and call 570 00:23:54,240 --> 00:23:58,200 them exactly this and the reason for 571 00:23:56,520 --> 00:24:00,480 that is not because we care exactly 572 00:23:58,200 --> 00:24:01,860 about those things it's that students 573 00:24:00,480 --> 00:24:04,500 don't know 574 00:24:01,860 --> 00:24:07,620 so for people with experience you can 575 00:24:04,500 --> 00:24:09,059 say Okay organize your folders in some 576 00:24:07,620 --> 00:24:11,159 reasonable way 577 00:24:09,059 --> 00:24:13,500 but students don't know what you're 578 00:24:11,159 --> 00:24:14,880 talking about so just to make it 579 00:24:13,500 --> 00:24:17,580 concrete we give them like specific 580 00:24:14,880 --> 00:24:19,440 diets but then then we're very explicit 581 00:24:17,580 --> 00:24:21,240 about but then if you've got reasons to 582 00:24:19,440 --> 00:24:23,280 change that do it the way it works for 583 00:24:21,240 --> 00:24:24,600 you and use whatever kind of software 584 00:24:23,280 --> 00:24:26,880 you want 585 00:24:24,600 --> 00:24:28,280 to use whatever file sharing platform 586 00:24:26,880 --> 00:24:31,679 you want 587 00:24:28,280 --> 00:24:34,860 and you can do this in a whole range of 588 00:24:31,679 --> 00:24:36,840 exercises so the whole tier protocol is 589 00:24:34,860 --> 00:24:38,460 written as for like a senior thesis or a 590 00:24:36,840 --> 00:24:41,340 complete research paper 591 00:24:38,460 --> 00:24:43,620 but it can be scaled down to smaller 592 00:24:41,340 --> 00:24:46,200 exercises I was talking about like your 593 00:24:43,620 --> 00:24:50,039 very first introductory homework problem 594 00:24:46,200 --> 00:24:51,299 uh it can work with that so there's huge 595 00:24:50,039 --> 00:24:53,820 flexibility 596 00:24:51,299 --> 00:24:56,100 so and so it's like applicable at all 597 00:24:53,820 --> 00:24:58,919 different you know there's some core 598 00:24:56,100 --> 00:25:01,860 principles uh 599 00:24:58,919 --> 00:25:04,320 but they can be applied very flexibly in 600 00:25:01,860 --> 00:25:06,299 a lot of situations so 601 00:25:04,320 --> 00:25:09,179 so the 602 00:25:06,299 --> 00:25:11,580 we would like to find a way to go beyond 603 00:25:09,179 --> 00:25:14,340 just individual faculty members and get 604 00:25:11,580 --> 00:25:15,600 a little more critical mass going and we 605 00:25:14,340 --> 00:25:19,500 would 606 00:25:15,600 --> 00:25:20,940 love to see departmental coordination so 607 00:25:19,500 --> 00:25:22,799 that students don't just get this 608 00:25:20,940 --> 00:25:25,620 randomly in one class and then forget 609 00:25:22,799 --> 00:25:28,279 about it but it's what they do all along 610 00:25:25,620 --> 00:25:28,279 uh 611 00:25:32,340 --> 00:25:36,419 so 612 00:25:34,679 --> 00:25:39,500 we would and we would like to get more 613 00:25:36,419 --> 00:25:39,500 people involved 614 00:25:39,840 --> 00:25:43,159 we would like to get married 615 00:25:45,600 --> 00:25:52,380 so we'd like to get more people involved 616 00:25:47,640 --> 00:25:54,360 in writing curriculum uh so far we most 617 00:25:52,380 --> 00:25:57,539 everything we have posted is stuff that 618 00:25:54,360 --> 00:26:02,039 we've written and 619 00:25:57,539 --> 00:26:04,320 that doesn't scale very well uh so and 620 00:26:02,039 --> 00:26:06,620 the thing is 621 00:26:04,320 --> 00:26:09,720 if people do this in your classes then 622 00:26:06,620 --> 00:26:12,539 they can take what they've done in their 623 00:26:09,720 --> 00:26:15,539 homework problems or exercises and we 624 00:26:12,539 --> 00:26:17,520 could put that up on the tier website so 625 00:26:15,539 --> 00:26:21,720 so that's 626 00:26:17,520 --> 00:26:24,360 the framework right now is how to how to 627 00:26:21,720 --> 00:26:27,840 scale up and get greater involvement 628 00:26:24,360 --> 00:26:30,539 Beyond just individuals 629 00:26:27,840 --> 00:26:32,340 Christians so I hope so you will be 630 00:26:30,539 --> 00:26:35,360 interested and we'll hear from you or 631 00:26:32,340 --> 00:26:35,360 have opportunities to talk 632 00:26:37,500 --> 00:26:42,419 all right uh thanks thanks Richard 633 00:26:40,080 --> 00:26:44,700 um uh that's great and I definitely want 634 00:26:42,419 --> 00:26:48,620 to dig into some of these topics but 635 00:26:44,700 --> 00:26:48,620 first first we have to hear from Mars um 636 00:26:52,039 --> 00:26:57,720 okay yeah yeah completely burning um so 637 00:26:55,679 --> 00:27:00,299 how much time do you spend in class 638 00:26:57,720 --> 00:27:03,960 showing best practices about how to 639 00:27:00,299 --> 00:27:06,179 actually organize your files and that 640 00:27:03,960 --> 00:27:09,179 sort of thing is that something that is 641 00:27:06,179 --> 00:27:09,179 in 642 00:27:12,200 --> 00:27:18,440 so like how much time do I spend 643 00:27:15,000 --> 00:27:18,440 explaining any students 644 00:27:20,539 --> 00:27:25,140 because they say hey this this is how I 645 00:27:23,340 --> 00:27:27,000 organize this is how it works for me 646 00:27:25,140 --> 00:27:28,860 maybe start here so you don't have to 647 00:27:27,000 --> 00:27:30,600 reinvent the wheel and then kind of make 648 00:27:28,860 --> 00:27:33,059 little perturbations to see what works 649 00:27:30,600 --> 00:27:35,580 for you yeah so 650 00:27:33,059 --> 00:27:37,679 uh that's what we sort of spent five or 651 00:27:35,580 --> 00:27:39,720 six years it took like five or six years 652 00:27:37,679 --> 00:27:42,659 to kind of come up with a general scheme 653 00:27:39,720 --> 00:27:45,539 and if you look at the project to your 654 00:27:42,659 --> 00:27:49,620 website projectier.org and look at the 655 00:27:45,539 --> 00:27:51,900 tier protocol 4.0 that's like a template 656 00:27:49,620 --> 00:27:54,000 of all these folders and what goes in 657 00:27:51,900 --> 00:27:56,159 them and how they're organized and 658 00:27:54,000 --> 00:27:58,860 what's in a readme file 659 00:27:56,159 --> 00:28:02,159 and and so that's sort of the starting 660 00:27:58,860 --> 00:28:04,880 place and actually because that's posted 661 00:28:02,159 --> 00:28:08,419 now I don't have to spend that much time 662 00:28:04,880 --> 00:28:13,140 I say set up these set up these folders 663 00:28:08,419 --> 00:28:15,059 and read it up read on the website what 664 00:28:13,140 --> 00:28:16,380 goes into readme file and how these two 665 00:28:15,059 --> 00:28:19,140 files work 666 00:28:16,380 --> 00:28:22,520 uh so 667 00:28:19,140 --> 00:28:22,520 so that's 668 00:28:23,580 --> 00:28:28,020 one of the main things we've done is 669 00:28:25,200 --> 00:28:30,120 give people a template to start with and 670 00:28:28,020 --> 00:28:31,620 then tweet just test that like so please 671 00:28:30,120 --> 00:28:34,039 have a look and please get in touch with 672 00:28:31,620 --> 00:28:34,039 your questions 673 00:28:34,679 --> 00:28:38,100 okay um 674 00:28:36,600 --> 00:28:40,620 let's see 675 00:28:38,100 --> 00:28:42,299 well I did feel Duty bound to to tease 676 00:28:40,620 --> 00:28:44,880 Lars um I am looking forward to his 677 00:28:42,299 --> 00:28:46,740 comments bars and I have a frequently uh 678 00:28:44,880 --> 00:28:50,100 discussed let me get my screen sharing 679 00:28:46,740 --> 00:28:54,240 going uh frequently discussed 680 00:28:50,100 --> 00:28:55,799 um trying to expand reproducibility uh 681 00:28:54,240 --> 00:28:58,200 federal agencies especially when it 682 00:28:55,799 --> 00:28:59,400 comes to working with restricted data um 683 00:28:58,200 --> 00:29:01,440 and so something we've talked about a 684 00:28:59,400 --> 00:29:03,179 lot is how to incorporate students both 685 00:29:01,440 --> 00:29:04,679 undergraduate and graduate a sort of 686 00:29:03,179 --> 00:29:07,320 embedded interns inside the 687 00:29:04,679 --> 00:29:09,779 organizations to to facilitate some of 688 00:29:07,320 --> 00:29:12,120 that work and so I think Lars is well I 689 00:29:09,779 --> 00:29:13,919 know Lars is about to talk about this so 690 00:29:12,120 --> 00:29:16,980 um yeah well Lars let's let's hear it 691 00:29:13,919 --> 00:29:18,720 okay so um thanks uh thanks everybody 692 00:29:16,980 --> 00:29:22,440 for joining as well 693 00:29:18,720 --> 00:29:25,200 um and I I think what I'll be talking 694 00:29:22,440 --> 00:29:27,179 here piggybacks nicely on top of what 695 00:29:25,200 --> 00:29:29,399 you guys laid the groundwork for 696 00:29:27,179 --> 00:29:30,899 um because one of the things uh that you 697 00:29:29,399 --> 00:29:31,919 mentioned is that this is actually a 698 00:29:30,899 --> 00:29:35,820 really 699 00:29:31,919 --> 00:29:37,860 excellent use in later professional life 700 00:29:35,820 --> 00:29:39,419 one of the realizations in actually 701 00:29:37,860 --> 00:29:42,059 running the image you want to just move 702 00:29:39,419 --> 00:29:45,000 forward slides 703 00:29:42,059 --> 00:29:48,299 one of the things that I realized when 704 00:29:45,000 --> 00:29:50,820 running the AAA replication lab is that 705 00:29:48,299 --> 00:29:54,419 many of the skills that Richard had 706 00:29:50,820 --> 00:29:56,460 pointed out before are not present in my 707 00:29:54,419 --> 00:29:58,380 student work body that I work with 708 00:29:56,460 --> 00:30:00,960 and so I actually need to train them up 709 00:29:58,380 --> 00:30:02,640 on those skills at least to the level 710 00:30:00,960 --> 00:30:04,500 that they can verify not necessarily 711 00:30:02,640 --> 00:30:06,059 actively produce what is needed to do 712 00:30:04,500 --> 00:30:07,919 there so they need to recognize data 713 00:30:06,059 --> 00:30:09,360 citations it's one of the very first 714 00:30:07,919 --> 00:30:12,020 things that we do in the sort of 715 00:30:09,360 --> 00:30:15,299 intensive one-day training uh Stuff Etc 716 00:30:12,020 --> 00:30:17,159 so training undergraduates in this is is 717 00:30:15,299 --> 00:30:20,700 the ultimate goal right 718 00:30:17,159 --> 00:30:24,960 um how that occurs in my particular case 719 00:30:20,700 --> 00:30:27,299 happens not in a class but in an RA work 720 00:30:24,960 --> 00:30:29,220 on campus but that's introducing them to 721 00:30:27,299 --> 00:30:31,440 what they might be doing afterwards 722 00:30:29,220 --> 00:30:34,200 uh next slide 723 00:30:31,440 --> 00:30:36,240 so um the background is that the work 724 00:30:34,200 --> 00:30:38,700 that we accumulate 725 00:30:36,240 --> 00:30:41,039 um in in my lab for all the eight 726 00:30:38,700 --> 00:30:44,220 journals of the aea 727 00:30:41,039 --> 00:30:47,220 um is doing this at scale so I train uh 728 00:30:44,220 --> 00:30:49,260 normally around 15 or 20 students per uh 729 00:30:47,220 --> 00:30:51,000 every four months 730 00:30:49,260 --> 00:30:52,740 to come into my lab to help me with this 731 00:30:51,000 --> 00:30:54,840 there's normal transition of folks in 732 00:30:52,740 --> 00:30:56,700 and out of this but a good chunk of them 733 00:30:54,840 --> 00:30:58,440 actually stay until they graduate so 734 00:30:56,700 --> 00:30:59,760 this is a long-term engagement with them 735 00:30:58,440 --> 00:31:02,460 it's more than just a class which 736 00:30:59,760 --> 00:31:04,080 disappears which if it if 737 00:31:02,460 --> 00:31:05,580 reproducibility techniques appeared in 738 00:31:04,080 --> 00:31:07,980 every class would be a similar type of 739 00:31:05,580 --> 00:31:10,020 engagement right 740 00:31:07,980 --> 00:31:12,600 um yeah next slide 741 00:31:10,020 --> 00:31:14,520 so the basic idea is that we start with 742 00:31:12,600 --> 00:31:15,840 what Diego laid out we have to first 743 00:31:14,520 --> 00:31:18,899 figure out where in a replication 744 00:31:15,840 --> 00:31:20,340 package data comes from right and so 745 00:31:18,899 --> 00:31:21,779 figuring that out what are they actually 746 00:31:20,340 --> 00:31:24,539 naming where do they get the data from 747 00:31:21,779 --> 00:31:26,520 how do others get at the data is core 748 00:31:24,539 --> 00:31:28,380 part of this but then the rest of it 749 00:31:26,520 --> 00:31:29,700 assume that you somehow got access to 750 00:31:28,380 --> 00:31:31,440 the data means that you're actually 751 00:31:29,700 --> 00:31:33,960 going to be running code which hopefully 752 00:31:31,440 --> 00:31:36,480 is organized as Richard laid it out many 753 00:31:33,960 --> 00:31:38,700 times it's not but trying to figure out 754 00:31:36,480 --> 00:31:41,580 what that other person thought that the 755 00:31:38,700 --> 00:31:44,279 next person running this was going to do 756 00:31:41,580 --> 00:31:46,500 is the key part that all of these Ras 757 00:31:44,279 --> 00:31:48,000 are essentially the first reader of 758 00:31:46,500 --> 00:31:50,520 right 759 00:31:48,000 --> 00:31:52,640 and so over the course of doing this 760 00:31:50,520 --> 00:31:55,860 they repeat repeat 761 00:31:52,640 --> 00:31:58,440 uh there's actually enormous uh learning 762 00:31:55,860 --> 00:32:02,340 going on here the uh scale up relatively 763 00:31:58,440 --> 00:32:03,960 quickly over the course of a year uh the 764 00:32:02,340 --> 00:32:06,600 um 40 students who at some point time in 765 00:32:03,960 --> 00:32:10,740 the year are in my lab help me review 766 00:32:06,600 --> 00:32:12,720 about 450 articles okay 767 00:32:10,740 --> 00:32:14,940 um but one of the key things that 768 00:32:12,720 --> 00:32:17,520 happens is that when we can't get access 769 00:32:14,940 --> 00:32:20,220 to the data we can do some sort of 770 00:32:17,520 --> 00:32:22,380 uh we can look at it it's plausible that 771 00:32:20,220 --> 00:32:25,020 it works but we can't get it where is 772 00:32:22,380 --> 00:32:26,700 that kind of data is restricted Access 773 00:32:25,020 --> 00:32:28,260 Data where you need to apply for data 774 00:32:26,700 --> 00:32:30,299 that could be because it needs to be 775 00:32:28,260 --> 00:32:32,039 purchased it could be because you have 776 00:32:30,299 --> 00:32:34,140 to have a data use agreement with New 777 00:32:32,039 --> 00:32:36,120 York state or something like that or it 778 00:32:34,140 --> 00:32:39,059 could be data that is confidential data 779 00:32:36,120 --> 00:32:40,380 in the fsrdc or other federal agencies 780 00:32:39,059 --> 00:32:42,240 okay 781 00:32:40,380 --> 00:32:44,940 and we have interacted with such 782 00:32:42,240 --> 00:32:47,760 entities on a regular basis where Ras 783 00:32:44,940 --> 00:32:50,159 say at the Federal Reserve have run code 784 00:32:47,760 --> 00:32:52,020 for us of authors that use data within 785 00:32:50,159 --> 00:32:53,580 the Federal Reserve data infrastructure 786 00:32:52,020 --> 00:32:57,419 Etc 787 00:32:53,580 --> 00:32:59,460 um we've had staff at the BLS run code 788 00:32:57,419 --> 00:33:01,919 for us when we could not access it in a 789 00:32:59,460 --> 00:33:06,659 timely fashion but we only scratched the 790 00:33:01,919 --> 00:33:09,419 surface in doing so okay uh in for it 791 00:33:06,659 --> 00:33:11,880 um so 792 00:33:09,419 --> 00:33:13,320 by taking these undergraduates and 793 00:33:11,880 --> 00:33:17,039 giving them these skills we actually 794 00:33:13,320 --> 00:33:19,200 think that we can also get at this data 795 00:33:17,039 --> 00:33:20,100 as well okay the skills we're teaching 796 00:33:19,200 --> 00:33:22,019 them 797 00:33:20,100 --> 00:33:24,299 are going to be 798 00:33:22,019 --> 00:33:26,220 um as I lay out practiced within an 799 00:33:24,299 --> 00:33:27,899 internship which replicates what we do 800 00:33:26,220 --> 00:33:29,100 at the journal but with a slightly 801 00:33:27,899 --> 00:33:31,679 different Focus 802 00:33:29,100 --> 00:33:33,779 okay and it teaches them important 803 00:33:31,679 --> 00:33:36,120 skills this is not just about how do I 804 00:33:33,779 --> 00:33:38,580 organize my data right 805 00:33:36,120 --> 00:33:40,799 um it's about visualizing best practices 806 00:33:38,580 --> 00:33:42,000 of the 10 cases that you've seen over 807 00:33:40,799 --> 00:33:44,580 the course of a year which ones went 808 00:33:42,000 --> 00:33:48,059 well which ones did not 809 00:33:44,580 --> 00:33:50,940 um communication skills uh all of these 810 00:33:48,059 --> 00:33:54,360 students write at the end of each paper 811 00:33:50,940 --> 00:33:56,640 a report that I reviewed right but I 812 00:33:54,360 --> 00:33:58,799 only minorly edit these they need to 813 00:33:56,640 --> 00:34:00,779 convey to me when something fail I need 814 00:33:58,799 --> 00:34:03,360 to assess whether it's their problem or 815 00:34:00,779 --> 00:34:04,740 the author's problem right and we go 816 00:34:03,360 --> 00:34:06,059 back and forth and they improve those 817 00:34:04,740 --> 00:34:07,440 communication skills to me but 818 00:34:06,059 --> 00:34:09,839 ultimately they're communicating up 819 00:34:07,440 --> 00:34:12,659 right they are writing a report which 820 00:34:09,839 --> 00:34:16,200 will be read by a Nobel Laureate who 821 00:34:12,659 --> 00:34:18,659 submitted a paper okay and so how to do 822 00:34:16,200 --> 00:34:21,839 that in a way that remains objective 823 00:34:18,659 --> 00:34:23,460 scientific uh evidence-based 824 00:34:21,839 --> 00:34:25,679 um where is the problem needs to be 825 00:34:23,460 --> 00:34:28,740 communicated uh clearly okay so they 826 00:34:25,679 --> 00:34:30,419 learn that they learn what's called 827 00:34:28,740 --> 00:34:32,460 um the document and data management 828 00:34:30,419 --> 00:34:34,379 curation skills they need to find some 829 00:34:32,460 --> 00:34:36,000 of the data based on the description 830 00:34:34,379 --> 00:34:37,379 that's there they need to organize it 831 00:34:36,000 --> 00:34:38,940 the way that the author said they don't 832 00:34:37,379 --> 00:34:40,379 have to invent it on their own what they 833 00:34:38,940 --> 00:34:41,760 need to do 834 00:34:40,379 --> 00:34:42,960 um but they need to sort of do all these 835 00:34:41,760 --> 00:34:44,639 all these are workflow and 836 00:34:42,960 --> 00:34:47,159 reproducibility skills that are being 837 00:34:44,639 --> 00:34:49,260 taught here that are valuable I believe 838 00:34:47,159 --> 00:34:51,720 in the professional context and given 839 00:34:49,260 --> 00:34:54,540 the sort of anecdotal evidence I get 840 00:34:51,720 --> 00:34:56,399 back from from my former Ras uh are 841 00:34:54,540 --> 00:34:57,720 actually valuable okay so they're useful 842 00:34:56,399 --> 00:35:00,000 in The Graduate says they're useful in 843 00:34:57,720 --> 00:35:02,339 non-income workplaces 844 00:35:00,000 --> 00:35:03,839 what do students do to get a foot into 845 00:35:02,339 --> 00:35:05,040 the door for these non-academic 846 00:35:03,839 --> 00:35:08,099 workplaces is among other things 847 00:35:05,040 --> 00:35:11,400 internships okay so how can we combine 848 00:35:08,099 --> 00:35:14,160 that into a single scenario where we can 849 00:35:11,400 --> 00:35:17,520 get them into scenarios where they can 850 00:35:14,160 --> 00:35:19,260 be Ras after graduation gain some skills 851 00:35:17,520 --> 00:35:20,700 beforehand gain some experience with 852 00:35:19,260 --> 00:35:22,859 what's their data action about these 853 00:35:20,700 --> 00:35:24,720 kinds of things as well as give back to 854 00:35:22,859 --> 00:35:28,560 those places because getting somebody to 855 00:35:24,720 --> 00:35:31,619 agree to host an intern comes with uh 856 00:35:28,560 --> 00:35:33,300 challenges Financial Obligations uh most 857 00:35:31,619 --> 00:35:35,280 of the time the thing that I encounter 858 00:35:33,300 --> 00:35:36,420 is it's a lot of work so something needs 859 00:35:35,280 --> 00:35:37,940 to be given back and we've got some 860 00:35:36,420 --> 00:35:39,839 ideas on how to do that 861 00:35:37,940 --> 00:35:42,300 okay 862 00:35:39,839 --> 00:35:44,760 um I think we can skip over that uh it 863 00:35:42,300 --> 00:35:46,560 helps in econ that we don't we aren't 864 00:35:44,760 --> 00:35:50,579 particularly diverse in the methods that 865 00:35:46,560 --> 00:35:52,260 we use 70 is data uh the rest is Matlab 866 00:35:50,579 --> 00:35:54,540 uh and then there's a spattering of R 867 00:35:52,260 --> 00:35:55,920 and Python and whatever it means they 868 00:35:54,540 --> 00:35:57,420 don't actually need to learn that many 869 00:35:55,920 --> 00:35:59,280 different skills but they are going to 870 00:35:57,420 --> 00:36:01,859 be faced with different skills so there 871 00:35:59,280 --> 00:36:03,599 will be the occasional python or our 872 00:36:01,859 --> 00:36:05,940 program so they also need to recognize 873 00:36:03,599 --> 00:36:07,800 when they can't actually do it because 874 00:36:05,940 --> 00:36:10,020 they don't have the right understanding 875 00:36:07,800 --> 00:36:11,880 of it but the way that authors describe 876 00:36:10,020 --> 00:36:13,619 on how to run are our state or math lab 877 00:36:11,880 --> 00:36:15,000 differs as well certain styles of 878 00:36:13,619 --> 00:36:17,579 communicating how you think about that 879 00:36:15,000 --> 00:36:19,560 there are certain ways of thinking that 880 00:36:17,579 --> 00:36:20,940 um the worst I must argue are those who 881 00:36:19,560 --> 00:36:22,200 provide Fortran programs because they 882 00:36:20,940 --> 00:36:24,300 just dump them on you and assume you 883 00:36:22,200 --> 00:36:26,040 know what to do with them and that 884 00:36:24,300 --> 00:36:27,300 happens to be the most diverse way of 885 00:36:26,040 --> 00:36:28,740 doing it because there's so many 886 00:36:27,300 --> 00:36:31,680 different compilers out there but that's 887 00:36:28,740 --> 00:36:33,060 just on the aside okay so 888 00:36:31,680 --> 00:36:37,440 um 889 00:36:33,060 --> 00:36:39,060 we train them uh on a variety of skills 890 00:36:37,440 --> 00:36:41,520 um recognizing the data as Diego 891 00:36:39,060 --> 00:36:43,920 mentioned uh the ability to describe the 892 00:36:41,520 --> 00:36:45,300 data once they found it so recognize 893 00:36:43,920 --> 00:36:47,820 that you actually haven't found the same 894 00:36:45,300 --> 00:36:50,400 data is just as important tracing data 895 00:36:47,820 --> 00:36:52,140 recognizing data access conditions Deco 896 00:36:50,400 --> 00:36:54,180 has it easy he just points them to a 897 00:36:52,140 --> 00:36:57,359 Fred website right 898 00:36:54,180 --> 00:36:58,980 um but to recognize that what these 899 00:36:57,359 --> 00:37:00,420 various data access conditions actually 900 00:36:58,980 --> 00:37:02,760 mean when they need to get access to 901 00:37:00,420 --> 00:37:04,560 them that you might need to apply with a 902 00:37:02,760 --> 00:37:07,079 three-page proposal and a one and a half 903 00:37:04,560 --> 00:37:08,940 year security clearance process versus I 904 00:37:07,079 --> 00:37:10,740 need to sign up here and just provide my 905 00:37:08,940 --> 00:37:14,160 email and download it there's a wide 906 00:37:10,740 --> 00:37:15,960 range between those two extremes that is 907 00:37:14,160 --> 00:37:17,099 something where they're not usually 908 00:37:15,960 --> 00:37:18,420 exposed to that as part of their 909 00:37:17,099 --> 00:37:20,760 undergraduate studies but they are 910 00:37:18,420 --> 00:37:23,339 exposed to that here and as an intern 911 00:37:20,760 --> 00:37:25,680 they may also be inside and realize that 912 00:37:23,339 --> 00:37:27,720 now that access is different but I might 913 00:37:25,680 --> 00:37:30,300 still need to be put on a project and 914 00:37:27,720 --> 00:37:34,380 participate in that and justify that 915 00:37:30,300 --> 00:37:36,720 okay uh among my Ras I I tend to uh pick 916 00:37:34,380 --> 00:37:39,060 one or two out that sort of do repeating 917 00:37:36,720 --> 00:37:42,119 uh data access requests so we use the 918 00:37:39,060 --> 00:37:44,280 ipums API we teach an RA on how to 919 00:37:42,119 --> 00:37:47,339 leverage that API we use we often get 920 00:37:44,280 --> 00:37:49,200 papers that use uh demographic uh DHS 921 00:37:47,339 --> 00:37:51,000 demographic household service 922 00:37:49,200 --> 00:37:53,339 um name I think the acronym has changed 923 00:37:51,000 --> 00:37:55,380 a few times and you need to put in a 924 00:37:53,339 --> 00:37:57,180 request for the specific files in doing 925 00:37:55,380 --> 00:37:59,339 that so we train students on these kinds 926 00:37:57,180 --> 00:38:01,320 of features all those are valuable in 927 00:37:59,339 --> 00:38:04,079 terms of um then going forward into 928 00:38:01,320 --> 00:38:06,320 other scenarios 929 00:38:04,079 --> 00:38:06,320 foreign 930 00:38:10,200 --> 00:38:14,400 so we have lots of guidance I'm actually 931 00:38:12,780 --> 00:38:16,260 thinking that data and code guidance 932 00:38:14,400 --> 00:38:18,560 about data citations are going to put 933 00:38:16,260 --> 00:38:20,760 the self-test for your thing in there 934 00:38:18,560 --> 00:38:22,260 because that's where we try to 935 00:38:20,760 --> 00:38:24,180 disentangle how do you actually cite 936 00:38:22,260 --> 00:38:26,040 data when you're told you actually this 937 00:38:24,180 --> 00:38:28,200 is confidential data Etc there's lots of 938 00:38:26,040 --> 00:38:30,000 examples in there it's what we use to 939 00:38:28,200 --> 00:38:31,740 train the students it's what I point 940 00:38:30,000 --> 00:38:34,200 authors to when they tell me I can't 941 00:38:31,740 --> 00:38:36,060 cite the data Etc so the training is 942 00:38:34,200 --> 00:38:38,700 actually quite similar 943 00:38:36,060 --> 00:38:41,640 um so debugging code right 944 00:38:38,700 --> 00:38:45,839 um after a number of runs they learn 945 00:38:41,640 --> 00:38:48,000 that debugging code is is not trivial uh 946 00:38:45,839 --> 00:38:49,440 sometimes it's just babysitting programs 947 00:38:48,000 --> 00:38:51,780 because there might not be a main file 948 00:38:49,440 --> 00:38:55,020 and things matter in which order they 949 00:38:51,780 --> 00:38:56,940 are run in other cases we go back to 950 00:38:55,020 --> 00:38:59,280 what Richard said something is failing 951 00:38:56,940 --> 00:39:01,020 show me where it's failing uh the 952 00:38:59,280 --> 00:39:03,660 software we tend to use is not super 953 00:39:01,020 --> 00:39:05,460 helpful with that either because an 954 00:39:03,660 --> 00:39:07,680 error in stadium might manifest itself 955 00:39:05,460 --> 00:39:09,420 500 lines further on in a log file and 956 00:39:07,680 --> 00:39:10,680 figuring out that that actually needs to 957 00:39:09,420 --> 00:39:12,540 be traced back 958 00:39:10,680 --> 00:39:14,280 is something that they always stumble 959 00:39:12,540 --> 00:39:15,960 over at least the first time 960 00:39:14,280 --> 00:39:19,700 um and so those are skills that they get 961 00:39:15,960 --> 00:39:19,700 from uh this training as well 962 00:39:21,240 --> 00:39:25,200 um so 963 00:39:23,160 --> 00:39:27,060 um that ability to debug and to 964 00:39:25,200 --> 00:39:28,980 communicate that to others whether it be 965 00:39:27,060 --> 00:39:31,140 their supervisor in this case me or to 966 00:39:28,980 --> 00:39:33,619 the authors is something that that comes 967 00:39:31,140 --> 00:39:33,619 out of us 968 00:39:34,680 --> 00:39:39,119 um 969 00:39:35,820 --> 00:39:40,800 so this I just wanted to focus on that 970 00:39:39,119 --> 00:39:43,500 all that is then condensed into a report 971 00:39:40,800 --> 00:39:45,180 that summarizes that information 972 00:39:43,500 --> 00:39:46,560 sometimes it's not supposed to summarize 973 00:39:45,180 --> 00:39:48,240 because you need to get at where the 974 00:39:46,560 --> 00:39:51,359 error is actually happening but so how 975 00:39:48,240 --> 00:39:55,079 to keep a report uh reasonable while 976 00:39:51,359 --> 00:39:56,760 pointing to where other things are is 977 00:39:55,079 --> 00:39:58,920 another one of those skills we give them 978 00:39:56,760 --> 00:40:00,480 a lot of guidance on that uh as Richard 979 00:39:58,920 --> 00:40:01,800 said they might not know what that looks 980 00:40:00,480 --> 00:40:04,740 like but we give them a template that 981 00:40:01,800 --> 00:40:07,020 they fill out and structure to do that 982 00:40:04,740 --> 00:40:09,380 and again they get better at this over 983 00:40:07,020 --> 00:40:09,380 time 984 00:40:10,380 --> 00:40:13,260 um I've already mentioned the 985 00:40:11,579 --> 00:40:15,560 communicate to the ability to 986 00:40:13,260 --> 00:40:18,839 communicate up um 987 00:40:15,560 --> 00:40:20,220 the um these are sort of um so some of 988 00:40:18,839 --> 00:40:21,060 the softer skills that they learn about 989 00:40:20,220 --> 00:40:23,579 this 990 00:40:21,060 --> 00:40:26,820 the ability to sort of put it into 991 00:40:23,579 --> 00:40:29,940 objective language uh one of the things 992 00:40:26,820 --> 00:40:31,800 that I convey is that there isn't bad 993 00:40:29,940 --> 00:40:34,440 code there is code that doesn't produce 994 00:40:31,800 --> 00:40:35,280 what's supposed to be produced 995 00:40:34,440 --> 00:40:37,800 um 996 00:40:35,280 --> 00:40:39,480 that's one way of sort of explaining in 997 00:40:37,800 --> 00:40:41,760 two different ways and that is something 998 00:40:39,480 --> 00:40:43,800 that doesn't necessarily come naturally 999 00:40:41,760 --> 00:40:45,720 um to to students as well 1000 00:40:43,800 --> 00:40:47,099 uh but that's extremely important when 1001 00:40:45,720 --> 00:40:48,839 you're working in a complex environment 1002 00:40:47,099 --> 00:40:52,020 such as a federal agency 1003 00:40:48,839 --> 00:40:55,380 next one so the opportunity is to bring 1004 00:40:52,020 --> 00:40:57,780 all this together we're doing this for a 1005 00:40:55,380 --> 00:41:00,240 journal but the basic idea is you have a 1006 00:40:57,780 --> 00:41:02,760 project that's been described in some 1007 00:41:00,240 --> 00:41:05,400 fashion by their authors and it has 1008 00:41:02,760 --> 00:41:08,339 certain features reproducible or not 1009 00:41:05,400 --> 00:41:10,380 that adhere to it 1010 00:41:08,339 --> 00:41:12,300 um we are interested in as a journal 1011 00:41:10,380 --> 00:41:13,800 from we can't get access to this data an 1012 00:41:12,300 --> 00:41:15,720 easy way so why don't we just send a 1013 00:41:13,800 --> 00:41:18,240 couple of students in next summer work 1014 00:41:15,720 --> 00:41:19,680 down a list of doing these things 1015 00:41:18,240 --> 00:41:22,500 um 1016 00:41:19,680 --> 00:41:23,400 um and stockpile essentially those data 1017 00:41:22,500 --> 00:41:25,500 um 1018 00:41:23,400 --> 00:41:28,380 many other journals don't even have the 1019 00:41:25,500 --> 00:41:30,660 resources that the aea has here so um 1020 00:41:28,380 --> 00:41:32,640 part of this could very well be and 1021 00:41:30,660 --> 00:41:34,320 we've trialled some of this with with 1022 00:41:32,640 --> 00:41:36,599 graduate students as a pilot where we 1023 00:41:34,320 --> 00:41:38,760 just take a pile of papers published in 1024 00:41:36,599 --> 00:41:39,780 a journal in the last year or two and 1025 00:41:38,760 --> 00:41:41,700 we've done this with the Canadian 1026 00:41:39,780 --> 00:41:44,160 Journal of economics and my co-organizer 1027 00:41:41,700 --> 00:41:45,900 of the session 1028 00:41:44,160 --> 00:41:47,040 um to to do that for the Canadian 1029 00:41:45,900 --> 00:41:48,660 Journal of Economics where we just 1030 00:41:47,040 --> 00:41:50,700 worked through a pile of papers with the 1031 00:41:48,660 --> 00:41:53,099 students that are there 1032 00:41:50,700 --> 00:41:55,859 um take the time it takes 1033 00:41:53,099 --> 00:41:58,380 and do the feedback or the verification 1034 00:41:55,859 --> 00:42:00,660 of reproducibility exposed 1035 00:41:58,380 --> 00:42:02,220 what happens when it fails well there's 1036 00:42:00,660 --> 00:42:03,839 a data editor in the room who can then 1037 00:42:02,220 --> 00:42:05,520 go back to the authors and say well you 1038 00:42:03,839 --> 00:42:08,700 actually committed to producing a 1039 00:42:05,520 --> 00:42:11,099 reproducible uh package fix this 1040 00:42:08,700 --> 00:42:12,420 um and then we'll have that public so it 1041 00:42:11,099 --> 00:42:14,579 might be of interest to other journals 1042 00:42:12,420 --> 00:42:15,839 to do this as a bring in an intern who's 1043 00:42:14,579 --> 00:42:17,460 been trained on these kinds of matters 1044 00:42:15,839 --> 00:42:19,380 or bringing a bunch of interns do it as 1045 00:42:17,460 --> 00:42:21,780 an educational exercise that's a summer 1046 00:42:19,380 --> 00:42:22,800 workshop Etc could be called an 1047 00:42:21,780 --> 00:42:24,300 internship because you might be 1048 00:42:22,800 --> 00:42:25,920 interning at the journal could be called 1049 00:42:24,300 --> 00:42:26,700 a summer workshop to do this kind of 1050 00:42:25,920 --> 00:42:28,619 stuff 1051 00:42:26,700 --> 00:42:30,839 next one 1052 00:42:28,619 --> 00:42:32,160 um sorry I've already said that 1053 00:42:30,839 --> 00:42:35,339 um so 1054 00:42:32,160 --> 00:42:37,380 um then taking this to 1055 00:42:35,339 --> 00:42:38,820 um uh how this relates to the other 1056 00:42:37,380 --> 00:42:41,280 activities when they actually come back 1057 00:42:38,820 --> 00:42:42,440 from such an exercise 1058 00:42:41,280 --> 00:42:44,880 um 1059 00:42:42,440 --> 00:42:46,680 they are typically not currently 1060 00:42:44,880 --> 00:42:49,200 learning this um in their coursework 1061 00:42:46,680 --> 00:42:50,820 even not in specific data science 1062 00:42:49,200 --> 00:42:52,940 classes 1063 00:42:50,820 --> 00:42:52,940 um 1064 00:42:54,119 --> 00:42:58,980 actually I probably should have skipped 1065 00:42:55,560 --> 00:43:00,720 the slide uh why don't we just skip that 1066 00:42:58,980 --> 00:43:03,660 um so 1067 00:43:00,720 --> 00:43:05,400 how useful is this right so how useful 1068 00:43:03,660 --> 00:43:07,380 could this be transferred into a 1069 00:43:05,400 --> 00:43:10,140 professional environment 1070 00:43:07,380 --> 00:43:11,819 um I uh get about the typical internet 1071 00:43:10,140 --> 00:43:13,619 survey response rate even when I 1072 00:43:11,819 --> 00:43:15,599 surveyed my own form of Ras when I send 1073 00:43:13,619 --> 00:43:18,480 them a survey link uh so we we tried 1074 00:43:15,599 --> 00:43:19,440 this uh we got a few responses 1075 00:43:18,480 --> 00:43:21,359 um 1076 00:43:19,440 --> 00:43:22,560 the responses we did get were generally 1077 00:43:21,359 --> 00:43:24,240 positive 1078 00:43:22,560 --> 00:43:26,760 um both in terms of the actual technical 1079 00:43:24,240 --> 00:43:28,200 skills learned and of the mindset that 1080 00:43:26,760 --> 00:43:30,359 was conveyed which is what you were 1081 00:43:28,200 --> 00:43:32,579 emphasizing right so the idea that you 1082 00:43:30,359 --> 00:43:35,040 can actually have people think about 1083 00:43:32,579 --> 00:43:36,720 this the notion I get back from the 1084 00:43:35,040 --> 00:43:38,640 students is that once they actually go 1085 00:43:36,720 --> 00:43:40,560 out especially into the non-academic 1086 00:43:38,640 --> 00:43:43,140 world the idea that they need to 1087 00:43:40,560 --> 00:43:46,380 actually review somebody else's code 1088 00:43:43,140 --> 00:43:48,119 before the firm or the agency or 1089 00:43:46,380 --> 00:43:49,680 whatever sends it out is really 1090 00:43:48,119 --> 00:43:51,240 important 1091 00:43:49,680 --> 00:43:54,119 um and that's not something that we 1092 00:43:51,240 --> 00:43:57,540 typically teach them in our classwork 1093 00:43:54,119 --> 00:43:59,460 um so there is some scope here also to 1094 00:43:57,540 --> 00:44:00,780 um that we've we've traveled in of 1095 00:43:59,460 --> 00:44:02,880 course that we've taught before to sort 1096 00:44:00,780 --> 00:44:04,380 of have some in-class peer review of 1097 00:44:02,880 --> 00:44:06,720 code that was written as part of this 1098 00:44:04,380 --> 00:44:08,579 exercise as well and to provide feedback 1099 00:44:06,720 --> 00:44:09,720 on that 1100 00:44:08,579 --> 00:44:11,579 um 1101 00:44:09,720 --> 00:44:13,800 um that sociology a student for instance 1102 00:44:11,579 --> 00:44:16,140 has worked for a non-profit and 1103 00:44:13,800 --> 00:44:17,579 her takeaway was this helped a lot in 1104 00:44:16,140 --> 00:44:18,720 terms of documenting what they were 1105 00:44:17,579 --> 00:44:21,480 doing 1106 00:44:18,720 --> 00:44:22,740 um and conveying it up the pay grade 1107 00:44:21,480 --> 00:44:25,560 right 1108 00:44:22,740 --> 00:44:27,480 uh next one oh okay 1109 00:44:25,560 --> 00:44:30,180 um so translating this into an 1110 00:44:27,480 --> 00:44:31,980 internship would mean um sending the 1111 00:44:30,180 --> 00:44:35,940 training students 1112 00:44:31,980 --> 00:44:37,560 um uh through what I we currently use is 1113 00:44:35,940 --> 00:44:39,300 probably transferable as a curriculum it 1114 00:44:37,560 --> 00:44:41,040 might need to be adapted uh in terms of 1115 00:44:39,300 --> 00:44:43,560 how to do this 1116 00:44:41,040 --> 00:44:45,060 um if it is for a federal agency getting 1117 00:44:43,560 --> 00:44:46,980 them security clearance so you might 1118 00:44:45,060 --> 00:44:48,660 want to do this in January train them 1119 00:44:46,980 --> 00:44:51,359 and then get security clearance in the 1120 00:44:48,660 --> 00:44:54,420 summer when normal internships happen uh 1121 00:44:51,359 --> 00:44:55,380 sending them into an agency to do this 1122 00:44:54,420 --> 00:44:57,720 um 1123 00:44:55,380 --> 00:45:00,300 we can do this using articles that have 1124 00:44:57,720 --> 00:45:03,359 been published uh we can do this on 1125 00:45:00,300 --> 00:45:05,579 projects that are being worked on at the 1126 00:45:03,359 --> 00:45:08,280 agency by internal researchers or 1127 00:45:05,579 --> 00:45:10,920 federal researchers this is actually 1128 00:45:08,280 --> 00:45:12,720 done at some agencies but it certainly 1129 00:45:10,920 --> 00:45:14,819 is at least from my impression not 1130 00:45:12,720 --> 00:45:18,780 standard practice of most agencies to do 1131 00:45:14,819 --> 00:45:20,640 that kind of review before it exits 1132 00:45:18,780 --> 00:45:23,220 um and so there's there's various 1133 00:45:20,640 --> 00:45:25,859 opportunities to apply these skills and 1134 00:45:23,220 --> 00:45:27,960 at the end of the summer the students uh 1135 00:45:25,859 --> 00:45:29,700 would very well have contributed to the 1136 00:45:27,960 --> 00:45:31,200 agency would have learned 1137 00:45:29,700 --> 00:45:34,560 um my guess is that over an intense 1138 00:45:31,200 --> 00:45:35,280 summer they could do five or six papers 1139 00:45:34,560 --> 00:45:37,920 um 1140 00:45:35,280 --> 00:45:39,720 with a report that comes out of that and 1141 00:45:37,920 --> 00:45:41,520 then take that back into say their 1142 00:45:39,720 --> 00:45:43,319 senior year at the University and apply 1143 00:45:41,520 --> 00:45:45,060 it to work that they do there as well 1144 00:45:43,319 --> 00:45:46,319 with an eye on where they might be 1145 00:45:45,060 --> 00:45:49,440 employed afterwards whether that's 1146 00:45:46,319 --> 00:45:51,900 graduate school or professional life 1147 00:45:49,440 --> 00:45:54,500 um so that's sort of the the basic uh 1148 00:45:51,900 --> 00:45:56,640 gist of uh transferring the training 1149 00:45:54,500 --> 00:45:59,160 that comes out of a very specific 1150 00:45:56,640 --> 00:46:00,900 purpose of a journal's perspective uh 1151 00:45:59,160 --> 00:46:03,240 and applying it to a more General 1152 00:46:00,900 --> 00:46:07,520 context uh like internships at data 1153 00:46:03,240 --> 00:46:07,520 intensive agencies I'll stop there 1154 00:46:09,170 --> 00:46:12,720 [Music] 1155 00:46:10,859 --> 00:46:13,920 we have nothing to do with the journals 1156 00:46:12,720 --> 00:46:17,160 and you'll think about every Federal 1157 00:46:13,920 --> 00:46:18,020 agency making their stuff reproducible 1158 00:46:17,160 --> 00:46:21,060 um 1159 00:46:18,020 --> 00:46:22,260 I don't actually so I think it lines up 1160 00:46:21,060 --> 00:46:25,079 with 1161 00:46:22,260 --> 00:46:28,500 um the current focus on transparency as 1162 00:46:25,079 --> 00:46:31,079 we go forward uh the future National 1163 00:46:28,500 --> 00:46:33,020 secure data service uh will need some 1164 00:46:31,079 --> 00:46:36,300 sort of transparency and reproducibility 1165 00:46:33,020 --> 00:46:39,180 uh it goes forward with credibility 1166 00:46:36,300 --> 00:46:40,560 um when you can't publish the data what 1167 00:46:39,180 --> 00:46:41,940 kind of Reliance do I have that you 1168 00:46:40,560 --> 00:46:44,099 actually did the work 1169 00:46:41,940 --> 00:46:45,660 properly in the first place 1170 00:46:44,099 --> 00:46:47,819 um and so conveying that there is a 1171 00:46:45,660 --> 00:46:50,640 review process in place I think is 1172 00:46:47,819 --> 00:46:52,920 useful but that review process can be 1173 00:46:50,640 --> 00:46:55,920 the ra will never 1174 00:46:52,920 --> 00:46:58,020 uh second guess your your the 1175 00:46:55,920 --> 00:47:01,319 correctness of your code 1176 00:46:58,020 --> 00:47:03,480 and that is a key important part of 1177 00:47:01,319 --> 00:47:05,400 going forward but they can certainly 1178 00:47:03,480 --> 00:47:06,720 assess whether it actually runs in 1179 00:47:05,400 --> 00:47:09,240 reproducible manner so that somebody 1180 00:47:06,720 --> 00:47:10,140 else can do that 1181 00:47:09,240 --> 00:47:15,000 um 1182 00:47:10,140 --> 00:47:16,560 I um from the work at the AAA I I don't 1183 00:47:15,000 --> 00:47:19,560 get a lot of feedback about how useful 1184 00:47:16,560 --> 00:47:22,079 the reproducible archives are except 1185 00:47:19,560 --> 00:47:24,300 anecdotally at some conferences and one 1186 00:47:22,079 --> 00:47:26,579 of my satisfying moments is when 1187 00:47:24,300 --> 00:47:28,560 somebody said this was something that I 1188 00:47:26,579 --> 00:47:29,940 downloaded I set aside three weeks so 1189 00:47:28,560 --> 00:47:32,160 that I could work on making it 1190 00:47:29,940 --> 00:47:32,940 reproducible and it worked within three 1191 00:47:32,160 --> 00:47:36,480 hours 1192 00:47:32,940 --> 00:47:38,700 right that's because we have done that 1193 00:47:36,480 --> 00:47:41,640 pre-veting into something that makes it 1194 00:47:38,700 --> 00:47:43,319 feasible and now you're three weeks into 1195 00:47:41,640 --> 00:47:45,060 your research project that said I I 1196 00:47:43,319 --> 00:47:47,460 don't believe that these assumptions are 1197 00:47:45,060 --> 00:47:49,380 right what happens when 1198 00:47:47,460 --> 00:47:52,560 we've accelerated the research process 1199 00:47:49,380 --> 00:47:53,940 right and that is true for any of of any 1200 00:47:52,560 --> 00:47:55,980 of the papers that might come out that 1201 00:47:53,940 --> 00:47:57,000 are policy papers that should you know 1202 00:47:55,980 --> 00:47:58,920 if they're if they're putting something 1203 00:47:57,000 --> 00:48:01,020 forward they should add a minimum be 1204 00:47:58,920 --> 00:48:02,280 reproducible that still doesn't make 1205 00:48:01,020 --> 00:48:04,500 them correct but at least they're 1206 00:48:02,280 --> 00:48:07,400 reproducible it allows others to then to 1207 00:48:04,500 --> 00:48:07,400 to second guess those 1208 00:48:08,099 --> 00:48:13,619 okay uh well I I do want to take a 1209 00:48:11,280 --> 00:48:15,900 moment thanks Lawrence to to thank the 1210 00:48:13,619 --> 00:48:18,900 panelists and then we'll turn over uh to 1211 00:48:15,900 --> 00:48:22,579 uh some questions from from the floor 1212 00:48:18,900 --> 00:48:22,579 um but let's give a round of applause 1213 00:48:23,579 --> 00:48:30,079 okay um so uh are there any questions 1214 00:48:27,240 --> 00:48:30,079 from the floor 1215 00:48:31,020 --> 00:48:36,599 I guess I'm curious about implementing 1216 00:48:33,540 --> 00:48:37,619 and undergraduates uh program so I'm 1217 00:48:36,599 --> 00:48:39,119 curious about like what are the actual 1218 00:48:37,619 --> 00:48:41,040 stumbling blocks that you would run into 1219 00:48:39,119 --> 00:48:43,500 when you have 1220 00:48:41,040 --> 00:48:44,940 um try to implement this at Haverford um 1221 00:48:43,500 --> 00:48:47,160 in terms when you are actually speaking 1222 00:48:44,940 --> 00:48:50,099 to other faculty members 1223 00:48:47,160 --> 00:48:52,680 um how has that worked and so 1224 00:48:50,099 --> 00:48:54,900 at Wake Forest we just started teaching 1225 00:48:52,680 --> 00:48:56,099 a exactly you said probably isn't the 1226 00:48:54,900 --> 00:48:57,859 best practice a one and a half hour 1227 00:48:56,099 --> 00:49:00,000 course I'll representation 1228 00:48:57,859 --> 00:49:02,460 right now 1229 00:49:00,000 --> 00:49:04,500 um and so we're taking baby steps in 1230 00:49:02,460 --> 00:49:07,319 that direction that seems like a ways 1231 00:49:04,500 --> 00:49:10,319 off for our so I'm curious to hear you 1232 00:49:07,319 --> 00:49:12,180 talk about your experience with that one 1233 00:49:10,319 --> 00:49:13,980 so 1234 00:49:12,180 --> 00:49:17,660 it's sort of like one of the hurdles to 1235 00:49:13,980 --> 00:49:17,660 overcome to make this happen so 1236 00:49:19,260 --> 00:49:22,619 thinking of the traditional mindset 1237 00:49:20,940 --> 00:49:24,300 we've had of like what does an 1238 00:49:22,619 --> 00:49:27,839 individual faculty member have to do to 1239 00:49:24,300 --> 00:49:27,839 make this happen uh 1240 00:49:32,040 --> 00:49:39,060 it takes some startup costs but it's 1241 00:49:35,400 --> 00:49:42,359 really not as bad as you would think so 1242 00:49:39,060 --> 00:49:45,060 you know what that the place to start 1243 00:49:42,359 --> 00:49:46,920 with is what you want to do with your 1244 00:49:45,060 --> 00:49:48,480 students and what you want to teach them 1245 00:49:46,920 --> 00:49:50,880 and what kind of like what kind of 1246 00:49:48,480 --> 00:49:52,200 exercises have you been 1247 00:49:50,880 --> 00:49:54,000 and then 1248 00:49:52,200 --> 00:49:57,000 you just say okay so then how can I 1249 00:49:54,000 --> 00:49:59,400 build reducibility on top of that 1250 00:49:57,000 --> 00:50:00,900 I think if you go like for instance like 1251 00:49:59,400 --> 00:50:02,339 look at the tier protocol and say how do 1252 00:50:00,900 --> 00:50:04,319 I do this 1253 00:50:02,339 --> 00:50:06,300 it's not going to be as effective as 1254 00:50:04,319 --> 00:50:08,040 saying okay how can I wrap up this 1255 00:50:06,300 --> 00:50:09,780 project I already had 1256 00:50:08,040 --> 00:50:10,920 and just what extra pieces would 1257 00:50:09,780 --> 00:50:13,680 students have to do to make it 1258 00:50:10,920 --> 00:50:17,040 replaceable and and now one common 1259 00:50:13,680 --> 00:50:19,040 stumbling block is writing scripts 1260 00:50:17,040 --> 00:50:21,060 so 1261 00:50:19,040 --> 00:50:22,800 the first few times from your faculty 1262 00:50:21,060 --> 00:50:24,300 development workshops 1263 00:50:22,800 --> 00:50:26,040 we didn't say anything in the 1264 00:50:24,300 --> 00:50:27,900 information explicitly about writing 1265 00:50:26,040 --> 00:50:30,599 scripts and then like through the people 1266 00:50:27,900 --> 00:50:31,980 show up like oh we have to excel no our 1267 00:50:30,599 --> 00:50:33,720 department says we have to use Excel and 1268 00:50:31,980 --> 00:50:35,460 I didn't really 1269 00:50:33,720 --> 00:50:38,220 they didn't get much out of the next two 1270 00:50:35,460 --> 00:50:41,099 days so we started making that clear in 1271 00:50:38,220 --> 00:50:42,060 the information so it got to be writing 1272 00:50:41,099 --> 00:50:44,400 scripts 1273 00:50:42,060 --> 00:50:48,380 and what whether it's 1274 00:50:44,400 --> 00:50:51,740 say to do files or SPSS or anything else 1275 00:50:48,380 --> 00:50:51,740 but then 1276 00:50:52,020 --> 00:50:55,319 then the short version is just 1277 00:50:53,760 --> 00:50:58,020 everything they do they write in they 1278 00:50:55,319 --> 00:51:00,720 write new files or scripts of some kind 1279 00:50:58,020 --> 00:51:03,240 and save them and organize them and 1280 00:51:00,720 --> 00:51:05,819 and you can you can ratchet it up bit by 1281 00:51:03,240 --> 00:51:07,079 bit kind of incrementally like the sort 1282 00:51:05,819 --> 00:51:08,460 of the full-blown thing is to have like 1283 00:51:07,079 --> 00:51:11,280 separate folders for everything and 1284 00:51:08,460 --> 00:51:13,200 relative directory paths that save stuff 1285 00:51:11,280 --> 00:51:14,760 where it's supposed to go and grab data 1286 00:51:13,200 --> 00:51:17,640 from where it lives 1287 00:51:14,760 --> 00:51:20,520 but that doesn't have to be what the 1288 00:51:17,640 --> 00:51:22,619 first time and and you know what 1289 00:51:20,520 --> 00:51:25,980 it's really not that hard the biggest 1290 00:51:22,619 --> 00:51:27,900 stumbling block is getting people 1291 00:51:25,980 --> 00:51:31,200 just to 1292 00:51:27,900 --> 00:51:34,079 spend a few days on it thinking about 1293 00:51:31,200 --> 00:51:35,880 and and uh 1294 00:51:34,079 --> 00:51:37,859 and 1295 00:51:35,880 --> 00:51:39,420 and part of what project here is about 1296 00:51:37,859 --> 00:51:41,700 is helping people do that so if there's 1297 00:51:39,420 --> 00:51:43,140 any interest in Wake Forest or you want 1298 00:51:41,700 --> 00:51:45,900 to get in touch 1299 00:51:43,140 --> 00:51:47,579 we are happy to work on a custom basis 1300 00:51:45,900 --> 00:51:48,300 just try to help you figure out what you 1301 00:51:47,579 --> 00:51:50,819 want 1302 00:51:48,300 --> 00:51:52,680 I might add to that what I've found to 1303 00:51:50,819 --> 00:51:55,200 be quite useful 1304 00:51:52,680 --> 00:51:56,700 um is to have a motivating example that 1305 00:51:55,200 --> 00:51:58,200 leads into it so you start with 1306 00:51:56,700 --> 00:51:59,940 something that is easy 1307 00:51:58,200 --> 00:52:01,619 what we've done in the past to sort of 1308 00:51:59,940 --> 00:52:04,559 say okay here's this thing we want you 1309 00:52:01,619 --> 00:52:06,720 to compute for uh I don't know uh 1310 00:52:04,559 --> 00:52:09,720 Tompkins count right uh that's where 1311 00:52:06,720 --> 00:52:11,280 Cornell is located so go find these 1312 00:52:09,720 --> 00:52:14,940 three numbers for Tompkins County 1313 00:52:11,280 --> 00:52:16,319 compute this statistic okay next week we 1314 00:52:14,940 --> 00:52:18,240 go back and say okay you figure this 1315 00:52:16,319 --> 00:52:20,520 thing out now we're going to do for 1316 00:52:18,240 --> 00:52:22,859 every County in the U.S 1317 00:52:20,520 --> 00:52:24,119 that is the introduction to they're not 1318 00:52:22,859 --> 00:52:26,460 going to do it that week because now we 1319 00:52:24,119 --> 00:52:28,380 need to go back and say okay what were 1320 00:52:26,460 --> 00:52:30,660 the steps that you did how can you start 1321 00:52:28,380 --> 00:52:32,819 to make this reproducible 3000 plus 1322 00:52:30,660 --> 00:52:34,859 times because you don't want to spend 1323 00:52:32,819 --> 00:52:38,099 the time doing that you could but you 1324 00:52:34,859 --> 00:52:40,020 don't want to right and then you get to 1325 00:52:38,099 --> 00:52:43,440 okay now you've got these data sources 1326 00:52:40,020 --> 00:52:45,900 how do we download them structure them 1327 00:52:43,440 --> 00:52:47,819 where you want to keep them uh you might 1328 00:52:45,900 --> 00:52:49,200 have a ton of output files how where do 1329 00:52:47,819 --> 00:52:50,940 you would you put them if you start to 1330 00:52:49,200 --> 00:52:53,700 have a ton of output files 1331 00:52:50,940 --> 00:52:55,740 um Etc so you build up from a single 1332 00:52:53,700 --> 00:52:57,660 motivating example that then expands 1333 00:52:55,740 --> 00:53:00,540 drastically 1334 00:52:57,660 --> 00:53:01,859 um and we found that to also be sort of 1335 00:53:00,540 --> 00:53:03,839 the first time around yeah you didn't 1336 00:53:01,859 --> 00:53:05,040 Excel because you can right you can copy 1337 00:53:03,839 --> 00:53:05,819 paste the number from the website into 1338 00:53:05,040 --> 00:53:07,380 this 1339 00:53:05,819 --> 00:53:08,700 Second Time Around well we're going to 1340 00:53:07,380 --> 00:53:10,200 do three thousand times maybe that 1341 00:53:08,700 --> 00:53:11,819 doesn't work so well anymore 1342 00:53:10,200 --> 00:53:13,559 although somebody will come around and 1343 00:53:11,819 --> 00:53:16,079 figure out how to do Matrix manipulation 1344 00:53:13,559 --> 00:53:17,520 in in Excel and as long as you can 1345 00:53:16,079 --> 00:53:19,319 explain to the next person because then 1346 00:53:17,520 --> 00:53:21,119 we add that peer review onto it you're 1347 00:53:19,319 --> 00:53:23,940 going to exchange in class 1348 00:53:21,119 --> 00:53:26,880 and you're gonna have to read how to do 1349 00:53:23,940 --> 00:53:29,760 it and do it yourself as well 1350 00:53:26,880 --> 00:53:31,920 that's one way to sort of motivate it 1351 00:53:29,760 --> 00:53:34,079 which 1352 00:53:31,920 --> 00:53:35,819 I I find when I train those those 1353 00:53:34,079 --> 00:53:37,319 undergrads we walk through examples 1354 00:53:35,819 --> 00:53:38,940 simply because you're going to see far 1355 00:53:37,319 --> 00:53:40,920 more complex papers 1356 00:53:38,940 --> 00:53:43,020 we want you to stumble on when it's 1357 00:53:40,920 --> 00:53:44,940 really complexities decomposing simpler 1358 00:53:43,020 --> 00:53:48,300 parts and then go from there 1359 00:53:44,940 --> 00:53:50,839 that that I found helps a lot 1360 00:53:48,300 --> 00:53:50,839 so yeah 1361 00:53:57,780 --> 00:53:59,780 um 1362 00:54:09,960 --> 00:54:14,359 [Music] 1363 00:54:11,059 --> 00:54:16,859 don't have any access to computing power 1364 00:54:14,359 --> 00:54:18,780 but are interested in economics they 1365 00:54:16,859 --> 00:54:20,280 actually find those kind of guidelines 1366 00:54:18,780 --> 00:54:22,920 in terms of what are the best practices 1367 00:54:20,280 --> 00:54:25,140 how you can do this stuff the house was 1368 00:54:22,920 --> 00:54:28,140 very very useful that's a way to kind of 1369 00:54:25,140 --> 00:54:29,760 be in pull them in and also dissipate 1370 00:54:28,140 --> 00:54:31,980 that anxiety 1371 00:54:29,760 --> 00:54:35,119 um that they have associated with you 1372 00:54:31,980 --> 00:54:38,760 know Forces in general or leader courses 1373 00:54:35,119 --> 00:54:41,460 but trying to convince The Faculty 1374 00:54:38,760 --> 00:54:43,740 um and our efforts basically to change 1375 00:54:41,460 --> 00:54:45,720 the the two Norms that you were asking 1376 00:54:43,740 --> 00:54:47,940 us to basically change which is cool 1377 00:54:45,720 --> 00:54:50,040 transparency in terms of your work right 1378 00:54:47,940 --> 00:54:51,839 so whatever it is that we are producing 1379 00:54:50,040 --> 00:54:53,460 or they're producing as followers as 1380 00:54:51,839 --> 00:54:55,440 researchers and then also full 1381 00:54:53,460 --> 00:54:57,480 transparency in terms of how do you do 1382 00:54:55,440 --> 00:54:59,579 that to yourself so you can teach your 1383 00:54:57,480 --> 00:55:02,900 students he's a little bit I find it 1384 00:54:59,579 --> 00:55:05,460 this it's not fair to us 1385 00:55:02,900 --> 00:55:06,900 who do it you know for ourselves and 1386 00:55:05,460 --> 00:55:08,400 Forest students I feel like there are 1387 00:55:06,900 --> 00:55:11,819 going to be some sort of institutional 1388 00:55:08,400 --> 00:55:14,160 Norms of change at higher levels that 1389 00:55:11,819 --> 00:55:15,540 would help hold us instead of just you 1390 00:55:14,160 --> 00:55:17,099 know breathing on the shoulders of 1391 00:55:15,540 --> 00:55:19,559 individual instructors who are willing 1392 00:55:17,099 --> 00:55:22,819 to put your you know our time and change 1393 00:55:19,559 --> 00:55:27,000 the curriculum involved into the brain 1394 00:55:22,819 --> 00:55:29,940 I mean I I would put on my hat as data 1395 00:55:27,000 --> 00:55:33,059 during the sense of if you are intent on 1396 00:55:29,940 --> 00:55:35,460 publishing in 1397 00:55:33,059 --> 00:55:38,099 the top part of the publication 1398 00:55:35,460 --> 00:55:39,540 distribution that is what you have to do 1399 00:55:38,099 --> 00:55:41,339 so you don't you can't get away with 1400 00:55:39,540 --> 00:55:43,920 less 1401 00:55:41,339 --> 00:55:45,420 um there's an argument made by 1402 00:55:43,920 --> 00:55:48,059 um 1403 00:55:45,420 --> 00:55:51,000 Tim salmon uh in actually the first 1404 00:55:48,059 --> 00:55:52,319 session of this webinar series that you 1405 00:55:51,000 --> 00:55:53,819 really want to be the last journal not 1406 00:55:52,319 --> 00:55:56,040 asking for reproducibility because what 1407 00:55:53,819 --> 00:55:58,099 are the papers you're going to get 1408 00:55:56,040 --> 00:56:00,540 um so they're they're they're 1409 00:55:58,099 --> 00:56:03,540 probably can be I don't know if it's 1410 00:56:00,540 --> 00:56:04,680 necessarily so a race to the top rather 1411 00:56:03,540 --> 00:56:07,680 than the rights to the bottom in terms 1412 00:56:04,680 --> 00:56:09,540 of transparency and if you articulate it 1413 00:56:07,680 --> 00:56:12,359 that way 1414 00:56:09,540 --> 00:56:14,700 um I would say friends well if you can't 1415 00:56:12,359 --> 00:56:16,140 point at your own papers because they're 1416 00:56:14,700 --> 00:56:17,700 historically weren't within that 1417 00:56:16,140 --> 00:56:20,160 Paradigm because we have been shifting 1418 00:56:17,700 --> 00:56:21,839 that Paradigm and go find some of the 1419 00:56:20,160 --> 00:56:23,520 simpler articles saying the journal art 1420 00:56:21,839 --> 00:56:25,200 economic perspectives or things like 1421 00:56:23,520 --> 00:56:27,119 that where there's like three tables in 1422 00:56:25,200 --> 00:56:29,520 the paper or something like that but 1423 00:56:27,119 --> 00:56:31,020 they are all now also reproducible and 1424 00:56:29,520 --> 00:56:32,099 transparent in their methods and they 1425 00:56:31,020 --> 00:56:33,839 might be simple and they might even 1426 00:56:32,099 --> 00:56:37,319 include some Excel files 1427 00:56:33,839 --> 00:56:40,500 but they are a way to motivate why would 1428 00:56:37,319 --> 00:56:42,720 you do it that way right and and so it 1429 00:56:40,500 --> 00:56:44,460 may not be your own work lots of our old 1430 00:56:42,720 --> 00:56:46,079 work is not necessarily up to those 1431 00:56:44,460 --> 00:56:47,640 standards 1432 00:56:46,079 --> 00:56:49,680 um because it wasn't scrutinized as 1433 00:56:47,640 --> 00:56:51,720 definitely as it is today but if you go 1434 00:56:49,680 --> 00:56:53,880 out today and I'm not just saying go to 1435 00:56:51,720 --> 00:56:56,280 the a journals go to any number of of 1436 00:56:53,880 --> 00:56:58,079 the journals that have been doing this 1437 00:56:56,280 --> 00:56:59,579 kind of vetting with the data editor and 1438 00:56:58,079 --> 00:57:02,460 editor in charge of doing these kinds of 1439 00:56:59,579 --> 00:57:04,440 things uh you will find papers of any 1440 00:57:02,460 --> 00:57:06,740 desirable complexity to certain things 1441 00:57:04,440 --> 00:57:09,240 as examples for that 1442 00:57:06,740 --> 00:57:11,099 you might start with toy examples 1443 00:57:09,240 --> 00:57:12,359 because that gets people into this 1444 00:57:11,099 --> 00:57:14,520 easier 1445 00:57:12,359 --> 00:57:16,859 um and then progress to sort of saying 1446 00:57:14,520 --> 00:57:18,240 you know there's sophisticated economic 1447 00:57:16,859 --> 00:57:20,339 reasoning around these texts that's 1448 00:57:18,240 --> 00:57:22,079 Illustrated with a couple of graphs and 1449 00:57:20,339 --> 00:57:23,640 those graphs are made in the transparent 1450 00:57:22,079 --> 00:57:26,400 way in a way that you can reproduce them 1451 00:57:23,640 --> 00:57:28,920 for instance by going back to uh to Fred 1452 00:57:26,400 --> 00:57:30,420 to sort of get at the data Etc 1453 00:57:28,920 --> 00:57:32,099 I don't think I have a problem 1454 00:57:30,420 --> 00:57:34,200 convincing the students to do it right 1455 00:57:32,099 --> 00:57:36,180 or to doing PhD level students because 1456 00:57:34,200 --> 00:57:37,380 they they know they need the skill they 1457 00:57:36,180 --> 00:57:39,240 need to learn it and they typically 1458 00:57:37,380 --> 00:57:41,460 don't actually get even 1459 00:57:39,240 --> 00:57:44,339 specifically guidance on how much do it 1460 00:57:41,460 --> 00:57:46,559 and if I'm talking to a African-American 1461 00:57:44,339 --> 00:57:48,000 student who wants to go into PhD she 1462 00:57:46,559 --> 00:57:49,980 definitely doesn't have those skills 1463 00:57:48,000 --> 00:57:54,180 because she also needs to put herself 1464 00:57:49,980 --> 00:57:56,400 out there asking to to control that yeah 1465 00:57:54,180 --> 00:57:58,740 so pulling that back into the curriculum 1466 00:57:56,400 --> 00:58:01,140 but that's going back to school to The 1467 00:57:58,740 --> 00:58:02,940 Faculty who is teaching the course or 1468 00:58:01,140 --> 00:58:04,800 teaching services and saying can we 1469 00:58:02,940 --> 00:58:08,579 please put that in but that gets into 1470 00:58:04,800 --> 00:58:11,160 very marginal you know costs and that's 1471 00:58:08,579 --> 00:58:12,599 where you have the law you know I'm not 1472 00:58:11,160 --> 00:58:15,240 willing to adopt it because I already 1473 00:58:12,599 --> 00:58:17,700 have my syllabus and 1474 00:58:15,240 --> 00:58:22,099 I mean like Richard said there is an 1475 00:58:17,700 --> 00:58:22,099 initial cost in doing so but it it 1476 00:58:22,740 --> 00:58:26,220 you can probably speak to this more than 1477 00:58:24,420 --> 00:58:28,319 I can do but one of the 1478 00:58:26,220 --> 00:58:30,059 things that you see in the literature is 1479 00:58:28,319 --> 00:58:31,680 that that initial investment can 1480 00:58:30,059 --> 00:58:35,520 actually open up other pedagogical 1481 00:58:31,680 --> 00:58:37,079 capabilities right so for instance I 1482 00:58:35,520 --> 00:58:38,940 don't need to rerun your reproducible 1483 00:58:37,079 --> 00:58:40,079 code if it's truly reproducible I can 1484 00:58:38,940 --> 00:58:42,180 have 1485 00:58:40,079 --> 00:58:44,280 the system where you run it right we're 1486 00:58:42,180 --> 00:58:46,680 going to work from GitHub education or 1487 00:58:44,280 --> 00:58:48,299 some other platform Etc people do this 1488 00:58:46,680 --> 00:58:50,220 with Dropbox folders something appears 1489 00:58:48,299 --> 00:58:52,200 in the folder they script on it and run 1490 00:58:50,220 --> 00:58:54,420 it that actually makes your life an 1491 00:58:52,200 --> 00:58:56,280 instructor easier because now you can 1492 00:58:54,420 --> 00:58:58,380 provide objective feedback you don't 1493 00:58:56,280 --> 00:59:01,400 need to do this all on your own Etc so 1494 00:58:58,380 --> 00:59:04,799 it can enable new feedback mechanisms 1495 00:59:01,400 --> 00:59:07,799 that actually accelerate the educational 1496 00:59:04,799 --> 00:59:09,839 process right because you're not just 1497 00:59:07,799 --> 00:59:12,900 lecturing you're actually interactively 1498 00:59:09,839 --> 00:59:14,880 developing this type of it's but that is 1499 00:59:12,900 --> 00:59:17,280 an argument that has to be made that has 1500 00:59:14,880 --> 00:59:18,960 to be bought and itself 1501 00:59:17,280 --> 00:59:20,520 there isn't a lot of empirical evidence 1502 00:59:18,960 --> 00:59:22,579 there's a lot of anecdotal evidence that 1503 00:59:20,520 --> 00:59:22,579 this 1504 00:59:23,220 --> 00:59:28,380 way so and and uh 1505 00:59:26,339 --> 00:59:30,119 yeah and but convincing people that's 1506 00:59:28,380 --> 00:59:32,400 hard so that I mean that I'm glad you 1507 00:59:30,119 --> 00:59:34,740 asked and I hope we can talk afterwards 1508 00:59:32,400 --> 00:59:37,200 because I I don't know the answer 1509 00:59:34,740 --> 00:59:38,339 exactly but somehow getting these Norms 1510 00:59:37,200 --> 00:59:41,960 to shift 1511 00:59:38,339 --> 00:59:41,960 and you know one one 1512 00:59:42,059 --> 00:59:46,920 promising thing is that in there are 1513 00:59:45,180 --> 00:59:49,740 other disciplines that I like with like 1514 00:59:46,920 --> 00:59:51,599 the American statistical Association in 1515 00:59:49,740 --> 00:59:53,540 their education committee there's some 1516 00:59:51,599 --> 00:59:56,819 people who are very committed to this 1517 00:59:53,540 --> 00:59:58,680 and would be happy to join in some 1518 00:59:56,819 --> 01:00:00,299 efforts in political science and 1519 00:59:58,680 --> 01:00:01,859 socialiting 1520 01:00:00,299 --> 01:00:04,460 so 1521 01:00:01,859 --> 01:00:04,460 so 1522 01:00:05,220 --> 01:00:10,440 like Shifting the Norms is an important 1523 01:00:07,260 --> 01:00:12,359 thing what one one Paradox to think 1524 01:00:10,440 --> 01:00:14,160 about is 1525 01:00:12,359 --> 01:00:16,920 it'll be nice to have some maybe 1526 01:00:14,160 --> 01:00:18,480 top-down leadership in this thing 1527 01:00:16,920 --> 01:00:21,480 instructors being told they have to do 1528 01:00:18,480 --> 01:00:24,720 this but but then I think 1529 01:00:21,480 --> 01:00:26,760 I'm you know when the associate Dean 1530 01:00:24,720 --> 01:00:29,819 comes and says here's how you have to do 1531 01:00:26,760 --> 01:00:32,160 things that's not a good way to start so 1532 01:00:29,819 --> 01:00:35,160 okay something that you said in your 1533 01:00:32,160 --> 01:00:36,420 remarks this get me to think about you 1534 01:00:35,160 --> 01:00:39,599 know this idea that you know your 1535 01:00:36,420 --> 01:00:42,180 students my students would do some 1536 01:00:39,599 --> 01:00:44,579 hard to digest things with their data 1537 01:00:42,180 --> 01:00:46,980 and it would make no sense 1538 01:00:44,579 --> 01:00:49,559 but it's as long as just what happens in 1539 01:00:46,980 --> 01:00:51,540 the classrooms in the classroom we're 1540 01:00:49,559 --> 01:00:53,700 not going to make any progress these 1541 01:00:51,540 --> 01:00:55,319 practices are able to connect with 1542 01:00:53,700 --> 01:00:57,240 habits in the classroom but things that 1543 01:00:55,319 --> 01:00:58,559 are going to happen beyond the classroom 1544 01:00:57,240 --> 01:00:59,700 so 1545 01:00:58,559 --> 01:01:02,160 just 1546 01:00:59,700 --> 01:01:04,020 I don't know where you and I 1547 01:01:02,160 --> 01:01:06,119 what do you need the motivator for that 1548 01:01:04,020 --> 01:01:07,799 change I think it's a professional 1549 01:01:06,119 --> 01:01:10,020 organizational the American economic 1550 01:01:07,799 --> 01:01:12,660 Association needs to communicate that 1551 01:01:10,020 --> 01:01:15,180 the professional certificate that this 1552 01:01:12,660 --> 01:01:17,099 is not just 1553 01:01:15,180 --> 01:01:18,599 it's a best practice a best scientific 1554 01:01:17,099 --> 01:01:19,980 practice I don't know did the American 1555 01:01:18,599 --> 01:01:22,740 economics the American Medical 1556 01:01:19,980 --> 01:01:24,540 Association complicated 1557 01:01:22,740 --> 01:01:27,119 before they operate 1558 01:01:24,540 --> 01:01:29,880 they was that a memo or something that 1559 01:01:27,119 --> 01:01:30,900 everybody discovered individually 1560 01:01:29,880 --> 01:01:32,700 um 1561 01:01:30,900 --> 01:01:33,900 so 1562 01:01:32,700 --> 01:01:36,180 I think we should learn from that 1563 01:01:33,900 --> 01:01:38,400 experience I I should add on I don't 1564 01:01:36,180 --> 01:01:41,280 think economics is unique in that 1565 01:01:38,400 --> 01:01:43,680 challenge I've observed for instance one 1566 01:01:41,280 --> 01:01:44,760 of my kids has taken a couple biostats 1567 01:01:43,680 --> 01:01:46,980 courses 1568 01:01:44,760 --> 01:01:48,119 and of course I had to sort of open my 1569 01:01:46,980 --> 01:01:50,040 mouth and tell her that there's more 1570 01:01:48,119 --> 01:01:52,319 peaceful way of doing our homework safer 1571 01:01:50,040 --> 01:01:53,579 work on the next homework and 1572 01:01:52,319 --> 01:01:56,640 uh 1573 01:01:53,579 --> 01:01:58,559 there was no credit given for reporting 1574 01:01:56,640 --> 01:02:01,200 it back other than in a copy and paste 1575 01:01:58,559 --> 01:02:04,200 in word form 1576 01:02:01,200 --> 01:02:07,020 um with very very poor reproducibility 1577 01:02:04,200 --> 01:02:09,900 and when things failed little in terms 1578 01:02:07,020 --> 01:02:11,760 of support so it's not unique to 1579 01:02:09,900 --> 01:02:13,859 economics that there might be some 1580 01:02:11,760 --> 01:02:16,319 righteousness there people are in their 1581 01:02:13,859 --> 01:02:18,260 materials the materials work uh they're 1582 01:02:16,319 --> 01:02:22,079 they're doing them 1583 01:02:18,260 --> 01:02:24,619 one idea might be that even the process 1584 01:02:22,079 --> 01:02:27,319 of a faculty member 1585 01:02:24,619 --> 01:02:30,319 enhancing the current 1586 01:02:27,319 --> 01:02:33,059 materials to be something more 1587 01:02:30,319 --> 01:02:34,500 reproducible is itself a teaching 1588 01:02:33,059 --> 01:02:36,299 exercise 1589 01:02:34,500 --> 01:02:37,980 right so actually going through that 1590 01:02:36,299 --> 01:02:40,260 work and detailing what are the steps to 1591 01:02:37,980 --> 01:02:42,359 go from that without necessarily laying 1592 01:02:40,260 --> 01:02:44,220 open that you weren't doing this or Etc 1593 01:02:42,359 --> 01:02:45,720 depending on how open you are about 1594 01:02:44,220 --> 01:02:48,059 these things but just to sort of say 1595 01:02:45,720 --> 01:02:49,740 okay here's the example now we're going 1596 01:02:48,059 --> 01:02:50,940 to take a detour around the sort of 1597 01:02:49,740 --> 01:02:52,440 blockage that was there that wasn't 1598 01:02:50,940 --> 01:02:54,900 reproducible and making it more 1599 01:02:52,440 --> 01:02:56,280 reasonable as I outlined earlier there 1600 01:02:54,900 --> 01:02:58,140 are steps you can take to sort of 1601 01:02:56,280 --> 01:03:00,180 reverse engineer it into being some 1602 01:02:58,140 --> 01:03:02,160 something that's reproducible 1603 01:03:00,180 --> 01:03:04,440 that whole process itself is a teaching 1604 01:03:02,160 --> 01:03:07,400 experience and that might be a way also 1605 01:03:04,440 --> 01:03:07,400 to gain more acceptance of 1606 01:03:07,559 --> 01:03:13,079 that 1607 01:03:09,500 --> 01:03:15,960 yeah I mean I fully agree with you with 1608 01:03:13,079 --> 01:03:19,079 your points that leading by example is 1609 01:03:15,960 --> 01:03:21,440 pretty much the issue so we have some 1610 01:03:19,079 --> 01:03:21,440 genres 1611 01:03:23,299 --> 01:03:29,960 always get to the problem do we have 1612 01:03:25,920 --> 01:03:29,960 enough resources for them so I 1613 01:03:34,940 --> 01:03:39,440 do you have any solutions for the 1614 01:03:37,140 --> 01:03:42,900 resource program I mean 1615 01:03:39,440 --> 01:03:44,720 everyone have individual program as you 1616 01:03:42,900 --> 01:03:49,140 suggested already 1617 01:03:44,720 --> 01:03:51,180 so I submitted for example two papers to 1618 01:03:49,140 --> 01:03:52,940 your institutions 1619 01:03:51,180 --> 01:03:56,099 um 1620 01:03:52,940 --> 01:03:59,220 and I haven't submitted some code so 1621 01:03:56,099 --> 01:04:01,200 because I wasn't up for it and so I 1622 01:03:59,220 --> 01:04:03,420 didn't have an incentive 1623 01:04:01,200 --> 01:04:04,740 yeah 1624 01:04:03,420 --> 01:04:06,599 um 1625 01:04:04,740 --> 01:04:09,420 I have a future conversation with the 1626 01:04:06,599 --> 01:04:11,940 ilr review on this um but I I don't have 1627 01:04:09,420 --> 01:04:13,559 much to say about what what they do they 1628 01:04:11,940 --> 01:04:16,020 actually you did actually when 1629 01:04:13,559 --> 01:04:17,819 submitting to them to commit to 1630 01:04:16,020 --> 01:04:19,079 providing code if somebody actually asks 1631 01:04:17,819 --> 01:04:20,280 for it 1632 01:04:19,079 --> 01:04:23,160 um so that might have been the fine 1633 01:04:20,280 --> 01:04:25,559 print but that's actually is in there uh 1634 01:04:23,160 --> 01:04:27,359 that's known not to work super well but 1635 01:04:25,559 --> 01:04:29,040 um we'll see 1636 01:04:27,359 --> 01:04:30,839 um I mean this goes a bit beyond where 1637 01:04:29,040 --> 01:04:32,880 undergraduate education comes into play 1638 01:04:30,839 --> 01:04:35,819 I think part of it is a supply and 1639 01:04:32,880 --> 01:04:37,920 demand thing that as it becomes easier 1640 01:04:35,819 --> 01:04:40,799 to supply the reproducibility part it'll 1641 01:04:37,920 --> 01:04:43,260 become more natural to provide it in the 1642 01:04:40,799 --> 01:04:45,480 first place and 1643 01:04:43,260 --> 01:04:47,460 um one way to articulate to switch a bit 1644 01:04:45,480 --> 01:04:49,319 to the journal part 1645 01:04:47,460 --> 01:04:51,359 um even if the journal doesn't require 1646 01:04:49,319 --> 01:04:53,819 it there are a lot of authors Who 1647 01:04:51,359 --> 01:04:55,799 provided anyway right and so you might 1648 01:04:53,819 --> 01:04:57,960 have a GitHub open out there and you 1649 01:04:55,799 --> 01:05:01,559 might have archived your stuff 1650 01:04:57,960 --> 01:05:03,960 part of what I did at the a was say if 1651 01:05:01,559 --> 01:05:05,880 you've actually done these steps ahead 1652 01:05:03,960 --> 01:05:08,579 of time you've actually say archived 1653 01:05:05,880 --> 01:05:11,520 your replication package on dataverse on 1654 01:05:08,579 --> 01:05:13,440 icpsr on zenodo I am actually if it's 1655 01:05:11,520 --> 01:05:15,119 fine I'm not actually going to move it 1656 01:05:13,440 --> 01:05:17,760 it's fine where it is you've done your 1657 01:05:15,119 --> 01:05:20,099 job we're done right 1658 01:05:17,760 --> 01:05:21,900 um generally there needs to be a bit of 1659 01:05:20,099 --> 01:05:23,579 improvement on the documentation but 1660 01:05:21,900 --> 01:05:24,900 that can be handled things like that so 1661 01:05:23,579 --> 01:05:26,280 there need to be incentives to also 1662 01:05:24,900 --> 01:05:27,660 facilitate it if you've actually done 1663 01:05:26,280 --> 01:05:29,819 that 1664 01:05:27,660 --> 01:05:32,520 one of the things that I argue to 1665 01:05:29,819 --> 01:05:34,500 graduate students which also to some 1666 01:05:32,520 --> 01:05:36,839 extent applies to undergraduate students 1667 01:05:34,500 --> 01:05:38,940 whether or not they're my kids or not is 1668 01:05:36,839 --> 01:05:41,520 that this actually saves you time in 1669 01:05:38,940 --> 01:05:44,540 your other classes and now it becomes a 1670 01:05:41,520 --> 01:05:46,859 win-win situation right I've seen papers 1671 01:05:44,540 --> 01:05:48,780 from clearly 1672 01:05:46,859 --> 01:05:49,680 coming out of a thesis of a graduate 1673 01:05:48,780 --> 01:05:52,200 student 1674 01:05:49,680 --> 01:05:55,020 that were extraordinarily time intensive 1675 01:05:52,200 --> 01:05:56,480 and really didn't need to be 1676 01:05:55,020 --> 01:05:59,940 and 1677 01:05:56,480 --> 01:06:01,799 that is how you value your own time is 1678 01:05:59,940 --> 01:06:05,099 one of the key factors here over which 1679 01:06:01,799 --> 01:06:07,200 you have control so just saying okay my 1680 01:06:05,099 --> 01:06:09,000 my thesis advisor asked me to do 30 1681 01:06:07,200 --> 01:06:10,500 different variations of this model and 1682 01:06:09,000 --> 01:06:12,000 I'm going to hand code all of them or 1683 01:06:10,500 --> 01:06:13,980 I'm going to script that invest a bit 1684 01:06:12,000 --> 01:06:16,079 into the scripting and then just push 1685 01:06:13,980 --> 01:06:18,299 the button and do and you want 300 of 1686 01:06:16,079 --> 01:06:20,700 them sure I can do that 1687 01:06:18,299 --> 01:06:22,440 um I think that's part of the equation 1688 01:06:20,700 --> 01:06:24,420 as well is to get the students 1689 01:06:22,440 --> 01:06:27,599 understand this isn't just 1690 01:06:24,420 --> 01:06:29,760 for external validity this is also Time 1691 01:06:27,599 --> 01:06:30,599 Saver like reproducibility is a Time 1692 01:06:29,760 --> 01:06:32,640 Saver 1693 01:06:30,599 --> 01:06:35,339 you messed up in that data cleaning Step 1694 01:06:32,640 --> 01:06:36,599 At the very start now you need to redo 1695 01:06:35,339 --> 01:06:38,280 your whole exercise that's due tomorrow 1696 01:06:36,599 --> 01:06:39,900 because you just found this in doing the 1697 01:06:38,280 --> 01:06:41,420 last revision if it's reproducible it's 1698 01:06:39,900 --> 01:06:44,579 push 1699 01:06:41,420 --> 01:06:46,980 if not if you're going to copy and paste 1700 01:06:44,579 --> 01:06:47,819 this all back into word it's a lot of 1701 01:06:46,980 --> 01:06:50,880 work 1702 01:06:47,819 --> 01:06:53,220 good night so that's self-motivator in 1703 01:06:50,880 --> 01:06:55,020 there is I think a key aspect that we 1704 01:06:53,220 --> 01:06:56,099 can bring through the entire research 1705 01:06:55,020 --> 01:06:58,559 stack all the way back to the 1706 01:06:56,099 --> 01:07:01,200 undergraduates as well 1707 01:06:58,559 --> 01:07:03,539 but once it becomes more easy to do so I 1708 01:07:01,200 --> 01:07:05,940 mean my one of my points on why I think 1709 01:07:03,539 --> 01:07:08,039 this this internship or or practicum 1710 01:07:05,940 --> 01:07:09,900 with journals might be of interest is 1711 01:07:08,039 --> 01:07:11,880 because it reduces the cost it turns 1712 01:07:09,900 --> 01:07:13,559 what appears to be a cost into a 1713 01:07:11,880 --> 01:07:15,660 positive exercise that both provides 1714 01:07:13,559 --> 01:07:17,640 educational benefit to the students 1715 01:07:15,660 --> 01:07:19,260 participating in it and provides a 1716 01:07:17,640 --> 01:07:21,780 benefit to the institution journal 1717 01:07:19,260 --> 01:07:24,900 agency whatever that is uh going through 1718 01:07:21,780 --> 01:07:26,520 that exercise with possibly a oh my God 1719 01:07:24,900 --> 01:07:27,839 this isn't as reproducible as we thought 1720 01:07:26,520 --> 01:07:30,000 it was 1721 01:07:27,839 --> 01:07:31,799 um but that's that's the painfulness of 1722 01:07:30,000 --> 01:07:33,780 transparency occasionally but there's a 1723 01:07:31,799 --> 01:07:35,099 win-win possibility there if you want to 1724 01:07:33,780 --> 01:07:37,680 do that so it doesn't actually require 1725 01:07:35,099 --> 01:07:39,780 additional resources every journalist 1726 01:07:37,680 --> 01:07:42,240 hosted somewhere some are commercially 1727 01:07:39,780 --> 01:07:44,760 hosted some are hosted at institutions 1728 01:07:42,240 --> 01:07:46,440 right can you do this can you 1729 01:07:44,760 --> 01:07:47,460 collaborate with institutions and it 1730 01:07:46,440 --> 01:07:49,260 allows 1731 01:07:47,460 --> 01:07:50,819 what are the side projects of the 1732 01:07:49,260 --> 01:07:54,180 internship was to actually to 1733 01:07:50,819 --> 01:07:55,799 particularly focus on urm students who 1734 01:07:54,180 --> 01:07:58,500 might have otherwise difficulty in 1735 01:07:55,799 --> 01:08:00,359 finding internships in agencies Etc but 1736 01:07:58,500 --> 01:08:03,059 who have otherwise the right skills to 1737 01:08:00,359 --> 01:08:04,380 do so part of it is that in order to get 1738 01:08:03,059 --> 01:08:05,640 them to that point you might need to do 1739 01:08:04,380 --> 01:08:07,079 some upskilling and this is an 1740 01:08:05,640 --> 01:08:08,039 opportunity to provide targeted 1741 01:08:07,079 --> 01:08:10,200 obstacles 1742 01:08:08,039 --> 01:08:11,460 how you run stata okay your institution 1743 01:08:10,200 --> 01:08:14,280 doesn't have to say okay let's do this 1744 01:08:11,460 --> 01:08:16,199 in R we can do that right uh you don't 1745 01:08:14,280 --> 01:08:18,239 have a computer okay you've got a Google 1746 01:08:16,199 --> 01:08:20,520 book we can do this in the cloud right 1747 01:08:18,239 --> 01:08:22,380 there are tools to do all of this 1748 01:08:20,520 --> 01:08:24,480 and to get back to the curricular 1749 01:08:22,380 --> 01:08:25,920 development there are a sample curricula 1750 01:08:24,480 --> 01:08:26,759 out there that sort of embed some of 1751 01:08:25,920 --> 01:08:29,400 those things whether you take them 1752 01:08:26,759 --> 01:08:30,600 wholesale or just take pieces of that 1753 01:08:29,400 --> 01:08:31,859 um that's the challenge I don't know of 1754 01:08:30,600 --> 01:08:34,259 a good research for that you would 1755 01:08:31,859 --> 01:08:36,060 probably collected more of them but 1756 01:08:34,259 --> 01:08:39,060 so 1757 01:08:36,060 --> 01:08:41,580 there's some examples and demos and 1758 01:08:39,060 --> 01:08:43,500 exercises on the tier website and many 1759 01:08:41,580 --> 01:08:46,500 more in the works you'll be coming soon 1760 01:08:43,500 --> 01:08:46,500 uh 1761 01:08:46,859 --> 01:08:49,699 foreign 1762 01:08:57,799 --> 01:09:04,739 I don't know but what what one thing 1763 01:09:03,179 --> 01:09:06,779 an interesting exercise for instructor 1764 01:09:04,739 --> 01:09:09,060 to do is just take 1765 01:09:06,779 --> 01:09:10,920 an existing exercise and say okay what 1766 01:09:09,060 --> 01:09:13,040 do I have to build around this to make 1767 01:09:10,920 --> 01:09:13,040 it 1768 01:09:15,239 --> 01:09:20,880 and what we're saying about getting a 1769 01:09:17,460 --> 01:09:24,600 Time safer at the same it truly is in 1770 01:09:20,880 --> 01:09:26,460 teaching because uh like once you've 1771 01:09:24,600 --> 01:09:29,580 done it a few times 1772 01:09:26,460 --> 01:09:31,500 it's easier what you're teaching and the 1773 01:09:29,580 --> 01:09:33,420 big thing is that when students come to 1774 01:09:31,500 --> 01:09:35,339 your office you can help them 1775 01:09:33,420 --> 01:09:37,259 in reasonable reasonable amount of time 1776 01:09:35,339 --> 01:09:39,719 it actually resolves so 1777 01:09:37,259 --> 01:09:42,299 where whereas they spend much less time 1778 01:09:39,719 --> 01:09:44,339 just uh 1779 01:09:42,299 --> 01:09:46,699 uh spinning their wheels and getting 1780 01:09:44,339 --> 01:09:46,699 nowhere 1781 01:09:46,819 --> 01:09:51,380 yeah yep yep yep 1782 01:09:54,679 --> 01:10:01,440 yeah and we're just pissing people off 1783 01:09:58,380 --> 01:10:03,679 and one thing you can do is just somehow 1784 01:10:01,440 --> 01:10:06,900 make your students work visible 1785 01:10:03,679 --> 01:10:09,480 so uh I have an ally I eat in the effort 1786 01:10:06,900 --> 01:10:12,679 Library where every student senior 1787 01:10:09,480 --> 01:10:15,540 thesis gets archived electronically and 1788 01:10:12,679 --> 01:10:17,239 he just started all so posting their 1789 01:10:15,540 --> 01:10:20,280 documentation with it 1790 01:10:17,239 --> 01:10:21,840 so and then you know some colleagues 1791 01:10:20,280 --> 01:10:24,540 kind of notice that and that there's a 1792 01:10:21,840 --> 01:10:27,260 little bit of Interest a small things so 1793 01:10:24,540 --> 01:10:30,600 there's a demonstration effect 1794 01:10:27,260 --> 01:10:32,580 I mean one of the things is that I found 1795 01:10:30,600 --> 01:10:34,199 when working with my Ras on these kind 1796 01:10:32,580 --> 01:10:37,199 of things 1797 01:10:34,199 --> 01:10:39,179 it takes me a lot less time to have them 1798 01:10:37,199 --> 01:10:41,460 figure out the problem by telling them 1799 01:10:39,179 --> 01:10:43,980 how they should document it so that I 1800 01:10:41,460 --> 01:10:45,659 can look at it because half of those 1801 01:10:43,980 --> 01:10:47,760 cases they then figure it out themselves 1802 01:10:45,659 --> 01:10:51,620 so it actually reduces my work as well 1803 01:10:47,760 --> 01:10:51,620 right and they learn something from that 1804 01:10:53,460 --> 01:10:57,659 I think I've got time for for one last 1805 01:10:56,100 --> 01:11:02,040 question and I'll take the the 1806 01:10:57,659 --> 01:11:04,260 moderators prerogative to to ask one 1807 01:11:02,040 --> 01:11:06,540 um I I wonder you know speaking about 1808 01:11:04,260 --> 01:11:10,199 all this and uh 1809 01:11:06,540 --> 01:11:12,179 I was you know I think uh some of 1810 01:11:10,199 --> 01:11:14,159 Diego's observations early on that this 1811 01:11:12,179 --> 01:11:15,659 is kind of something that doesn't 1812 01:11:14,159 --> 01:11:17,100 necessarily need to be a curriculum but 1813 01:11:15,659 --> 01:11:19,500 it can be woven through the the 1814 01:11:17,100 --> 01:11:20,880 curriculum in terms of Getting By and it 1815 01:11:19,500 --> 01:11:23,400 seems like 1816 01:11:20,880 --> 01:11:27,420 textbook Publishers in particular could 1817 01:11:23,400 --> 01:11:28,980 be doing more to facilitate some of this 1818 01:11:27,420 --> 01:11:31,199 it just at a simple level you know if 1819 01:11:28,980 --> 01:11:33,480 you can start to take baby steps to get 1820 01:11:31,199 --> 01:11:35,460 people thinking about data citation and 1821 01:11:33,480 --> 01:11:37,500 data provenance and you know when you 1822 01:11:35,460 --> 01:11:39,540 get not to call anyone out but say you 1823 01:11:37,500 --> 01:11:41,280 get woodridge's textbook and you get the 1824 01:11:39,540 --> 01:11:43,620 you know our Wooldridge package you have 1825 01:11:41,280 --> 01:11:46,080 no idea where most of those data came 1826 01:11:43,620 --> 01:11:47,699 from you know and it's the usual 1827 01:11:46,080 --> 01:11:49,500 prefabricated thing where all the data 1828 01:11:47,699 --> 01:11:51,360 already cleaned and you know you just 1829 01:11:49,500 --> 01:11:53,159 know it's a data set with wages and 1830 01:11:51,360 --> 01:11:54,300 gender or whatever 1831 01:11:53,159 --> 01:11:56,300 um so I just wonder if any of you have 1832 01:11:54,300 --> 01:11:59,520 thought about kind of the role of 1833 01:11:56,300 --> 01:12:00,780 textbook Publishers or I guess I mean 1834 01:11:59,520 --> 01:12:02,159 the AAA also does some of this 1835 01:12:00,780 --> 01:12:05,699 dissemination but it's just in 1836 01:12:02,159 --> 01:12:07,640 disseminating sort of more simple 1837 01:12:05,699 --> 01:12:11,100 um these more sorts of simple things 1838 01:12:07,640 --> 01:12:13,380 that don't require quite as much buildup 1839 01:12:11,100 --> 01:12:15,600 but that you know a uh somebody who's 1840 01:12:13,380 --> 01:12:17,400 already teaching econometrics or say 1841 01:12:15,600 --> 01:12:20,219 labor economics class could then very 1842 01:12:17,400 --> 01:12:22,199 easily start to kind of move more 1843 01:12:20,219 --> 01:12:24,239 towards at least thinking about data 1844 01:12:22,199 --> 01:12:25,760 citation I think about it every day and 1845 01:12:24,239 --> 01:12:28,980 I go to sleep 1846 01:12:25,760 --> 01:12:30,600 and my dream is that one day uh 1847 01:12:28,980 --> 01:12:33,840 textbooks the same way that they have a 1848 01:12:30,600 --> 01:12:35,900 chapter it's between chapter remembering 1849 01:12:33,840 --> 01:12:38,820 um 1850 01:12:35,900 --> 01:12:41,880 right after chapter one do you have to 1851 01:12:38,820 --> 01:12:44,100 have an appendix on how to read diagrams 1852 01:12:41,880 --> 01:12:46,500 so how to plot in a two-dimensional 1853 01:12:44,100 --> 01:12:48,900 diagram how to read a table and well 1854 01:12:46,500 --> 01:12:51,239 with that stuff why isn't there an 1855 01:12:48,900 --> 01:12:52,739 appendix on working with data why isn't 1856 01:12:51,239 --> 01:12:56,280 there an appendix or 1857 01:12:52,739 --> 01:13:00,360 citing data on acknowledging that data 1858 01:12:56,280 --> 01:13:02,400 so the headlines the indicators revise 1859 01:13:00,360 --> 01:13:05,400 but to that point I mean I 1860 01:13:02,400 --> 01:13:09,480 have Sonic there are evidence 1861 01:13:05,400 --> 01:13:11,460 real data because real data revises so 1862 01:13:09,480 --> 01:13:13,560 it's much more convenient to not even 1863 01:13:11,460 --> 01:13:15,600 use Alfred 1864 01:13:13,560 --> 01:13:17,760 right 1865 01:13:15,600 --> 01:13:19,739 right but that's yeah you know it will 1866 01:13:17,760 --> 01:13:22,380 be it will be sweet 1867 01:13:19,739 --> 01:13:25,260 uh and one day it will happen 1868 01:13:22,380 --> 01:13:27,600 maybe our grandchildren will see it uh 1869 01:13:25,260 --> 01:13:29,880 that wears yes it's a foundational skill 1870 01:13:27,600 --> 01:13:32,219 and it's part of introductory textbook 1871 01:13:29,880 --> 01:13:35,280 material you're actually suggesting that 1872 01:13:32,219 --> 01:13:37,440 that sample way uses a list of wages 1873 01:13:35,280 --> 01:13:39,300 actually has its data source and its 1874 01:13:37,440 --> 01:13:41,179 cleaning to become that data file also 1875 01:13:39,300 --> 01:13:43,920 exposed right 1876 01:13:41,179 --> 01:13:45,480 that is specifically what I'm describing 1877 01:13:43,920 --> 01:13:47,100 I mean I'm teaching a class right now 1878 01:13:45,480 --> 01:13:49,140 where I have to sort of go back and 1879 01:13:47,100 --> 01:13:52,020 explain to students that every data set 1880 01:13:49,140 --> 01:13:53,760 they've ever gotten is completely fake 1881 01:13:52,020 --> 01:13:55,560 essentially that's just designed for 1882 01:13:53,760 --> 01:13:57,600 them to be able to use complete data 1883 01:13:55,560 --> 01:13:59,460 estimator on it without having to get 1884 01:13:57,600 --> 01:14:01,980 their hands dirty and that's very 1885 01:13:59,460 --> 01:14:03,179 unrealistic way to view the world a and 1886 01:14:01,980 --> 01:14:05,400 that b they should understand where 1887 01:14:03,179 --> 01:14:06,060 those data came from in the first place 1888 01:14:05,400 --> 01:14:07,739 um 1889 01:14:06,060 --> 01:14:10,140 but it strikes me you know that's a lot 1890 01:14:07,739 --> 01:14:12,719 of heavy lifting on my part and so why 1891 01:14:10,140 --> 01:14:14,780 yeah you know 1892 01:14:12,719 --> 01:14:14,780 um 1893 01:14:15,080 --> 01:14:21,480 a good friend of uh 1894 01:14:18,360 --> 01:14:23,940 well Fred that has a companion website 1895 01:14:21,480 --> 01:14:25,980 with all the series uh reference you 1896 01:14:23,940 --> 01:14:28,560 know all all data you know links to all 1897 01:14:25,980 --> 01:14:31,320 the you all the threat data 1898 01:14:28,560 --> 01:14:33,420 that is using the textbook so yay I mean 1899 01:14:31,320 --> 01:14:36,420 the other way to to think of it is you 1900 01:14:33,420 --> 01:14:37,640 might want to publish an website that's 1901 01:14:36,420 --> 01:14:40,860 a 1902 01:14:37,640 --> 01:14:43,260 unofficial addendum to a textbook that 1903 01:14:40,860 --> 01:14:45,360 says okay here's the clean data set that 1904 01:14:43,260 --> 01:14:47,640 comes with the textbook 1905 01:14:45,360 --> 01:14:49,980 here's one where we've actually 1906 01:14:47,640 --> 01:14:52,679 downloaded some lnl Supply data here's 1907 01:14:49,980 --> 01:14:56,640 the cleaning we've done for it that's 1908 01:14:52,679 --> 01:14:58,199 lesson 15 which we might not get to but 1909 01:14:56,640 --> 01:14:59,820 at least you sort of Bring It Forward to 1910 01:14:58,199 --> 01:15:01,320 and here's the same thing with some real 1911 01:14:59,820 --> 01:15:03,840 data and it might not actually reproduce 1912 01:15:01,320 --> 01:15:06,120 because maybe all that clean data was 1913 01:15:03,840 --> 01:15:08,159 you know not not fully fully compliant 1914 01:15:06,120 --> 01:15:11,280 with that or things like that 1915 01:15:08,159 --> 01:15:13,020 you know you may also be taking on a 1916 01:15:11,280 --> 01:15:15,780 little more than you really have to in 1917 01:15:13,020 --> 01:15:16,860 this context I mean I think there's a 1918 01:15:15,780 --> 01:15:19,980 place 1919 01:15:16,860 --> 01:15:22,800 for students just learning about how to 1920 01:15:19,980 --> 01:15:26,520 do a technique and just having clean 1921 01:15:22,800 --> 01:15:28,500 data that they do this with just and you 1922 01:15:26,520 --> 01:15:30,420 know if they had to every single time go 1923 01:15:28,500 --> 01:15:31,800 back to the Micro Data and get rid of 1924 01:15:30,420 --> 01:15:35,340 the people who were in jail and 1925 01:15:31,800 --> 01:15:36,600 everything else then uh you know that 1926 01:15:35,340 --> 01:15:40,159 takes time attention away from just 1927 01:15:36,600 --> 01:15:40,159 throwing the techniques so so 1928 01:15:40,500 --> 01:15:43,739 like the obvious the kind of stuff 1929 01:15:42,300 --> 01:15:46,140 you're talking about building in is like 1930 01:15:43,739 --> 01:15:48,780 valuable stuff in itself maybe you know 1931 01:15:46,140 --> 01:15:51,179 the econometrics class and I'm just 1932 01:15:48,780 --> 01:15:54,480 sorry however I would say the key thing 1933 01:15:51,179 --> 01:15:57,480 to build in is whatever they do with it 1934 01:15:54,480 --> 01:15:58,800 they should make reproducible so as long 1935 01:15:57,480 --> 01:16:01,520 as they're writing scripts for whatever 1936 01:15:58,800 --> 01:16:04,679 exercises they do it might happen 1937 01:16:01,520 --> 01:16:07,739 so to be clear my particular classes 1938 01:16:04,679 --> 01:16:09,960 explicitly focused on measurement after 1939 01:16:07,739 --> 01:16:11,460 they've already had some some statistics 1940 01:16:09,960 --> 01:16:13,140 and so yeah part of the pedagogical 1941 01:16:11,460 --> 01:16:14,880 purposes to kind of highlight this for 1942 01:16:13,140 --> 01:16:16,679 them but yeah yeah the point point is 1943 01:16:14,880 --> 01:16:18,120 absolutely taken 1944 01:16:16,679 --> 01:16:20,880 um well unfortunately it seems like 1945 01:16:18,120 --> 01:16:23,400 we're out of official sea time for this 1946 01:16:20,880 --> 01:16:26,600 so I want to give everybody a round of 1947 01:16:23,400 --> 01:16:26,600 applause for participating 1948 01:16:27,080 --> 01:16:31,400 foreign 1949 01:16:28,880 --> 01:16:35,040 I guess look out for this to be posted 1950 01:16:31,400 --> 01:16:38,960 sometime depending on video quality and 1951 01:16:35,040 --> 01:16:38,960 join us for the next one that too