BU-857-M Sequential Procedure for Testing Germination Rates of Seeds Stored in Seedbanks September 1984 by Issac Bekele1 Department of Crop Science, The University of The West Indies, St. Augustine, Trinidad 1 Paper prepared when the author was a visiting fellow in Biometrics Unit, Cornell University, Ithaca, N.Y. Summary Samples of seeds stored for long-term conservation in seedbanks have to be monitored regularly in order to check the viability status of the seeds. In previous works, each inspection has been regarded as a separate statistical test of the null hypothesis that the sample needs regeneration. Here an overall procedure that treats each inspection as a part of a single process and subjects them to overall error rates will be developed. Properties of the procedure are examined and compared with other procedures. Key words Conservation; Germination test; Overshoot; Power I type tests; Seedbanks. 1. Introduction In technologically advanced countries farmers use modern cultivars (high yielding, disease resistant, etc) as opposed to traditional varieties which have commonly been used by farmers in developing countries. But in recent times these latter farmers have been slowly shifting into using introduced cultivars and abandoning the traditional varieties. Continuous use of modern cultivars with desirable characteristics is feasible only if a broad genetic base is retained for each species of crop plants which can be used as a pool for producing new varieties. This shift has exposed the natural gene pool to extinction (Frankel and Bennett, 1970). In an attempt to control this process of genetic erosion, measures are being undertaken at different levels throughout the world. Seeds of different species of traditional cultivated crops are being systematically collected and stored under conditions believed to prolong the survival of the seeds. Such storage facilities are termed 'genebanks' or 'seedbanks'. This approach is believed to be the cheapest and safest method of conserving crop genetic materials. Each sample of seeds is given a unique identification number either at point of collection from the fields or time of exchange and is referred to as an accession. All the accessions are kept under similar conditions but each is monitored separately. -2- Although under proper storage conditions the process of aging is believed to slow down, regular germination tests should be carried out on samples taken from an accession to check if viability has dropped to a level that requires regeneration of the accession. It has been argued that increases in the percentage of cells of surviving seeds which show chromosomeaberrations and the incidence of mutant phenotypes in succeeding generations are correlated with loss of viability (Abdella and Roberts, 1968 and 1969). Let p denote the proportion of viable seeds and pmin be the minimum p such that its consequences on surviving seeds which show chromosomeabberrations and mutant phenotypes in succeeding generations are within tolerable limits. Hence the accession can be kept in the storage without a need for regeneration as long as p does not drop below pmi n . But if p drops to pm1. n , then the accession must be regenerated and new seeds stored. Monitoring viability involves germinating seeds sampled from the accession. Usually the first test is carried out after time t 1 years from initial storage and a formal statistical test is made using the data from the germination test to determine whether or not to regenerate the accession. If the evidence is against regeneration, the seeds are kept in the store until the next regeneration time, regenerated and new seeds stored otherwise. Thus before regenerating an accession, a number of tests are carried out on groups of seeds sampled from it at different -3- stages of its life in the store. Since these tests are distinctive, sufficient seeds must be stored initially to insure availability of seeds for exchange, successive tests and regeneration when it is necessary. Hence, it is evident that both frequency of inspection and the number of seeds used for each test are important factors in determining the initial size of an accession. Therefore, adoption of a statistical procedure that requires fewer seeds for tests is highly desirable. The size of the overall error rates are also essential. The important error rate that has to be controlled is the probability of failing to regenerate the accession. If this rate is high, in the long run the seedbank would be losing some of its most valuable genetic materials. Secondly, it would be desirable if the procedure stops at or close to the true time of regeneration as possible because this could cut on the long-term cost of the seedbank. A sequential probability ratio test (SPRT) for testing percentage germination of seeds has been suggested for use in seedbanks (Ellis, Roberts and Whitehead, 1980 and Whitehead, 1981). SPRT and also fixed sample approach consider inspections at different times as unrelated statistical problems rather than part of an overall process and result in separate significant statements (inspection wise error rates). Although in both cases, inspection wise error rates are known the overall error rates are unknown. nevertheless, it is possible to estimate the unknown overall error -4- rates for each of these approaches from computer simulation for comparison purpose. At inspection timet., the new procedure makes use of l. information from all inspections up to time t . ]. - 1 and updates it with current information from germination test. Based on this cumulated information about viability condition of the seeds, a decision is made whether or not to regenerate the accession. Hence the whole monitoring process is treated as a single act. The method is based on the assumption that, for any ~ixed time period t, the number of germinating .seeds· out of n tested is binomially distributed with probability of ger- mination p(t). In addition, it is assumed that the logit of p(t) is a linear function of t. The test procedure is developed with some modification analogous to the power 1 type tests of Darling and Robbins (1967, 1968) for iid normal random variables. -5- 2. Formulation of the Problem Let p(tl..) denote the germination rate of the accession at time t.l. and T be the true time of regeneration (Tis unknown). Next let p = p (t ) 00 and pmin = p (T) p0 is the initial germination rate and pml..n is the terminal germination rate. Hence T denotes the true time it takes for p(tl..) to drop from po topml..n• An each-inspection time germination test is made and the following hypothesis assessed: HO: P $ Pmin HA: p > Pmin The accession is kept in the store as long as evidence supports HA and there are sufficient seeds for future testing. Now consider a case where tests carried out on a single seed basis and let t 1 , t 2 , ···, ti' ••• denote predetermined inspection times (note that the t.l.'s need not be all different since in practice test are carried out on a number of seeds at any given inspection time). Define 1, if a seed planted at t.l. germinated xl.. = O, otherwise -6- If then P(x.=l) = p(t.) l. l. xi is a Bernoulli random variable with parameter p(ti). The loglikelihood of p(t.) is given by: l. l(p(t.)) =Ex. t::>g {p (t.) I (1- p (t.))} + E [-,g {1- p (t.)} l. l.' l. l. l. Let 2.1 log (p (t.)) l. = {og{p (t.) I l. (1:. p (t1)) ~ be denoted by R(t.). l. Assume that R(t.) has the following form: l. R(t.) = R -St. l. 0 l. where R is the logit of p • 00 S is the rate of deterioration of seeds per unit time on a 2.2 2.3 logistic scale. It is a general parameter that includes the true rate of deterioration. Hence the loglikelihood of p(t.) reparameterized in terms l. of S is: f(S)=Ex.(R -St.)-Efog{l+exp(R -St.)}. l. 0 l. 0 l. Under this parameterization, it is desirable to regenerate the 2.4 accession when R(ti) drops to R1 (=R(T)), and maintain .acces- sian in the store otherwise. is the logit of pmJ..n • -7- 3. Test Procedure The test statistics are defined and the stopping rule is given below. An approximate overshoot correction is incorporated into the procedure. 3.1 Derivation of Test Statistics If S denotes the current time, it is desirable to regenerate the accession when S coincides with T where S < T. Suppose that each time an inspection is made it is pretend- ed that 'it is now time to regenerate the accession'. Let Ss denote the rate of deterioration of seeds under this pretense. Hence, at time s we have the following logistic regression line: Where Rs (t.) l = R 0 - Ss t . l for t. = t , ll t2, • • ·, s 3.1.1 Ss = (R 0 - R ) /S 1 The true logistic regression line is 3.1. 2 T 3.1. 3 Where ST = (R0 - R1)/T S includes all Ss's and ST. 3.1. 4 The hypothesis now can be expressed as in terms of S as follows: HO: Ss ~ST HA: Ss>ST Figure 3.1.1 shows the relationship between Rs (t.) l and RT(t.). l (Figure 3.1.1 goes here) -8- Now from (3.1.2) and (3.1.4) 8s > 8T as long as S < T. Hence it is desirable to regenerate the accession when 8s 8T. Otherwise, define Z = ~ t 1. (Y1. - Es (Y1. ) ) and 3.1.5 V= E_t.1n1.Es (Y1 ./n1 .)[_l-·Es (Y1./n1 )J. 3.1. 6 Summation is over all inspection times up to the current time S. Y. is the number of germinating seeds among the n. seeds 11 tested at time t. 1 and Es is the expectation under the pretended assertion 'it is now time to regenerate the accession' (refer to appendix B). Hence Es (Yi) =nips (ti). ps (t.)'s 1 are computed from the logits derived from Rs (t.). 1 > 0_ for all S < T 3.1. 7 E(Z = 0 at S = T < 0 for S > T. So E(Z) is a decreasing function of t and has different distri- butions at each timet .• 1 Now,·by analogy to Darling and Robbins (1967, 1968) proce- dure (Appendix A) and modifying (Appendix B) to serve the require- ments of seedbanks, the following stopping rule can be used. regenerate the accession if Z s_a(v) 3.1.8 continue otherwise -9- where 1 a(v)= {(v+l)[fog(v+l) -2fog2o:]}2 a is type I error of Darling and Robbins procedure and it can be chosen as small as desired. Then the following hold 3.1. 9 p(stopping too late)< a 3.1.10 p (stopping too early) -+ 0 as n-+ co 3.1.11 The test terminates with probability 1 as n-+ co (refer tc Appen- dix c for proofs). So at each inspection time, Z and a(v) are computed and based on the evidence either the accession is regenerated or sampling continued. The procedure controls the probability of stopping too -late as desired. And secondly the test terminates with prob- ability 1 as n increases at t. = T. l 3.2 Correction for Overshoot Examination of the properties of the procedure indicates that it is certainly conservative. The probability of failing to stop is lower than the desired level a and secondly for small sample size the procedure could lead to early stoppings. Therefore, an approximate correction is incorporated into the procedure by analogy to Siegmund (1979) and Whitehead (1981). At current inspection time s, information increases at rate I s , where I s = Rs S2p s (s) (1 - p s (s)) . 3.2.1 -10- Then an approximate correction is 0s = 0.583fsr. The procedure (3.1.8) becomes regenerate the accession if continue otherwise. ~mere 3.22 3.2.3 Z = Z+O cs 3.2.4 The correction increases-at smaller rate than V, and there- fore, the properties (3.1.10) and (3.1.11) still hold. The effect of the correction factor can be specially effective when small sample sizes are used. -11- 4. Discussion Computer simulation was used to examine the properties of the procedure and to make comparison between different tests. Table 4.1 gives estimated error probability (a) for 1000 replicates each for two different sample sizes. Twas set at 100 years and pm1. n at 0.85. The value used for a was 0.05. Table 4.1 Estimated error probabilities (a) for two initial germination rates p =0.99 and 0.95. 0 n 100 1000 0.95 0.001 0.001 0.95 0.002 0.000 For each of these simulations, inspection intervals of equal sizes of five years were used starting the first inspection at year five. Theoverall error rate was considerably smaller)than a as expected. AJso it is important to note that the sample size has no appreciable effect on the error rate. Tab.Le 4. 2 gives estimates of the probability of stopping too late for SPRT, the new approach and the fixed sample case. For each case an estimate of a based on 1000 runs is given. Two initial germination rates were used. A group of 40 seeds were used for SPRT which lead to the use of an average of 116 and 194 seeds for p = 0.99 and 0.95 respectively at any given 0 inspection time. For fixed sample case 467 seeds were used -12- per test. Inspection interval of 20 years was used starting with year 20 until the test te~rninated. T was fixed at 100 years. Table 4.2 Estimates of probability of stopping too late for the ~hree procedures Po Tests 0.95 0.99 SPRT 0.02 0.049 New Procedure* 0.009 0.01 Fixed Sam)2le 0.25 0.058 New Procedure(n=467) 0.004 0.004 * The new approach's estimates are based on 194 and 116 sample sizes for p = 0. 95 and 0. 99 respectively which is the same 0 as the average for the SPRT. The fixed sample requires 467 seeds to achieve the same result as SPRT. In fact an elaborate comparison of SPRT and fixed sample approach is given by Ellis and others. (1980). The fixed sample approach is extremely wasteful as compared to the other two. The SPRT approach stops too late on·average about 3.5 times more often than the new approach for the same average sample size. Hence the new approach shows a higher perfor- mancc! in this respect than SPRT. The use of the error rate to compare different procedures without considering the effect of inspection times could be unsatisfactory. -13- It would be interesting to see the magnitude of such an error when the inspection grid misses the desired time of regeneration. In fact this is one of the serious problems of predetermined inspection times. If the last inspection is carried out at t m, when t m > T, the error rate should be higher for any procedure. The size of course depends on the difference t n - T. Simulation was carried out to study the effect of inspection times on error rates for SPRT and the new approach (Table 4.3). Inspections were made at equal intervals of 20 years starting at 20 years for both cases. T was fixed at 90 years and initial germination rate of 0.99 was used. 1000 replicated runs were made for both approaches. Group of 40 seeds were used for SPRT which led to the use of an average of 145 seeds per inspection time. So 145 seeds per test were used in the simulation for the new procedure. Table 4. 3 Frequency of stoppages at different times of inspection out of 1000 replicates each for SPRT and new procedure. Inspection times (yrs) 20 40 60 80 100 120 Frequency SPRT New Procedure 00 00 0 39 260 745 738 216 20 When the last inspection is carried out after the true time of regeneration, which could happen in practice if pre- -14- determined times of inspections are used, the SPRT will stop more frequently at the first time after T the last time before T. For the same average sample size, the new procedure however, will stop more frequently at the last time before T than the first time after T. Although adoption of statistical procedures with desirable properties such as seed saving and ideally smaller error probabilities, their vulnerability to changes in inspection times must as well be accounted for. In practice this is a more serious problem because for thousands of accessions of different species of crop plants, the desirable times of regeneration were not known. An objective method of estimating these inspection times should be sought for. Certainly the new procedure indicates better performance in terms of smaller error rate than the SPRT which uses the same average sample size per inspection. Even if the last inspection is carried out after the true time of regeneration, fewer accessions will be regenerated after T years if the new procedure is used. But another important property of the new procedure is that it enables stochastic estimation of inspection times using germination information. Therefore, the procedure is a powerful statistical tool. -15- Acknowledgement The author wishes to thank Dr. C. E. McCulloch, Dr. D. Robson and Dr. J. Whitehead for helpful and inspiring discussions and valuable comments at different stages of the development of this work. Assistance provided by Cornell University is highly appreciated. -16- References Abdella, F. H. and Roberts, E. H. (1968). Effects of temperature, moisture and oxygen on the indirection of chromosome damage in seeds of barley, broad beans and peas during storage. Annals of Botany 32, 119-136. Abdella, F. H. and Roberts, E. H. (1969). The effects of temperature and moisture on the induction of genetic changes in seeds of barley, broad beans and peas during storage. Annals of Botany 33, 153-167. Bekele, I. (1981). Monitoring Accessions in Seedbanks. M.S. Thesis (unpublished). University of Reading: Reading. Darling, D. A. and Robbins, H. (1967). Inequalities for the sequences of sample means. Proceedings of National Academy of Science 57, 1577-1580. Darling, D. A. and Robbins, H. (1967). Confidence sequences of sample mean, variance and median. Proceedings of Na~ tional Academy of Science 58, 66-68. Darling, D. A. and Robbins, H. (1968). Some further remarks on inequalities for sample sums. Proceedings of National Academy of Science 60, 1175-1182. Ellis, R., Roberts, E. H. and Whitehead, J. (1980). A new more economic and accurate approach to monitoring the viability of accessions during storage in seedbanks. Plant Genetic Resources- Newsletter 41, 3-17. Frankel, 0. H. and Bennet, E. (1970). Genetic Resources in Plants -their Exploration and Conservation. Oxford: Blackwell Scientific Publications. International Board for Plant Genetic Resources (1976). Report of the IBPGR Working group on Engineering, Design and Cost aspect of long-term seed storage Facilities. Rome: IBPGR. Siegmund, D. (1979). Corrected diffusion approximation in certain random walk problems. Advances in Applied Probability 11, 701-719. Whitehead, J. (1981). The use of the sequential probability ratio test for monitoring the percentage germination of accessions in seedbank. Biometrics 37, 129-136. -17- Appendix A Power One type tests for Normal Random Variables Let x1 , x2 , • • • be iid normal random variables with E(xi) = e and v(x.) = 1. Suppose interest lies in testing l. H : 8.:i.. 0 0 versus Assume it is desirable to continue with sampling as long as H0 is true and quit sampling otherwise and take some appropriate action. Darling and Robbins have suggested the following type procedure: continue with sampling as long as where Sm < a (m) and Under H0 : S =X -·+ • • • +X m1 m' .1 a(m) = {(m+l)[log(m+l) +2log2a]} A.l A.2 Sm - N(O,m). Each time a sample is drawn, both S and a(m) are computed and m compared. The procedure calls for termination of inspection when S ~ a(m). m Darling and Robbins show that: ~18- PH (S ~a(m) for some m2:_1) ~a. 0m PH (Sm2:. a (m) for some m,2l) ~ 1 as m ~ oo, 1 In the next section a modified version of this procedure to suit the special case of monitoring percentage viability is given. A. 3 A.4 -19- Appendix B Derivation of test statistics First transitional test statistics zl and vl are derived and test procedure outlined with analogy to Darling and Robbins (1967, 1968) procedure. Then the statistics Z and v of section 3 are formally derived. I. For S close to S0 , the loglikelihood of S can be expanded approximately as: where f(S)~f(S 0 )+(S-S .0 )f 1 (S 0 )+~(S-S 0 ) 2 f"(.~o) B.l 1. I 0 The statistics zl and vl can be used, where z1 = -f 1 (B ) 0 B.2 -20- z1 =-i'(B) 0 and v 1 = -i" (B ) • 0 z1 is a linear function of the efficient score and v1 is Fisher's information. If ni seeds are used for germination test at time ti and yi denotes the number of seeds that germinated, then Z1 = L: t . (Y • - n . p T ( t . ) ) 11 1 1 and B.3 B.4 The sequential test is based on Sm and m of Darling and Robbins test replaced by z1 and v 1, respectively . . This analogy is reasonable since under H0 z1 - AN(O,v 1 ). Then by analogy to (A.3) where pH (Z 1 ~a(v 1 ) for some v 1 > 0) S <(R0 - R )IS 11 = T 1 where R0 and R1 are the logits of p at t 0 and T. Given information on the status of the seeds in storage B.8 up to time t 1 (=s), then T 1 is the future time for which it would be necessary to undertake regeneration of the accessions if T1 = T. Then from (B. 8) Continue with inspection as long as B.9 as long as Then -l 1 ((R8 - R1 ) Is) > a {-l" ((R0 - R1 ) Is)} B.lO -l 1 ((R0 - R1 ) 1s) = }:; t . {Y. ~~ Es (Y. ~ ) } -l"((R0 -R1 ) 1 s ) = } : ; t. ~ 2 n .E ~ s ( Y ~ . I n ~ . ) [ l - E s (Y.In)] ~ Where Es (Y.) ~ is the expectation of Y. ~ under the pretense that it is now time to regenerate the accession. (B.lO) holds for B.ll B.l2 all time t. a(vl) for t. T. l. p (stopping too late) < a p(stopping too early)+O as n+oo p(stopping at desired time T) + 1 as n +oo. C.l C2 C.3 C.4 C.5 -23- Proofs (C.3): p (stopping too late)= PH (Z > a(v) for all t .--< T) 1- 0 Ho

a(v) for ti = T). But at t. = T 1 pH(Z>a(v))=pH (Z 1 >a(v 1 ))~a.. 00 Hence the result (C.4): p (stopping too early) = PH. (Z ~a (v) for any t. < T) A1 =PH (Z 1 .::s_a(v)- f'.. for any t. < T). A1 At any given time ti' a(v) and f'.. are increasing functions of n. l a(v) increases by an order of n2 and f'.. by n. f'.. > 0 for all t. < T. 1 When a (v) - f'.. -+ -oo as n -+ oo pH (Z 1 S(a(v)- f'..) for any t. < T) -+0 as n-+.;;, A1 as required. (C.S): p(stopping at time T) =PH (Z ~a(v) for t. = T) A1 = p(Z 1 ,s. (a(v)- f'..) for t. = T) 1 -+ 1 as n-+ oo Noting that f'..= 0 at t = T and a(v) -+oo as n-+oo. i It follows that the test terminates with probability 1 as n -+co. -24- Title: Stochastic Estimation of Inspection times for Monitoring Viability of seeds in "Genebanks'. Summary: A procedure for estimating inspection times based on techniques to monitor viability of seeds suggested by Bekele (1984) is explained. Introduction: Background of the problem is summarized. The importance of objective estimations of the inspection times is explained. The distributional assumptions about survival of seeds are given and the model applied developed. Test Procedure: The test statistics are briefly defined and the decision process explained. The properties of test outlined. Estimation of Inspection times: Technique for esimating confidence sequences is given. The use of the confidence intervals for estimating inspection times is explained. Properties of the confidence interval are given. Discussion: Predetermined and estimated inspection times are compared simulation results. Modification of estimated times is suggested. R(t) -25- ~----------L---------------~----------------:> 0S T t