In most of cases it is difficult (if not impossible) to take a simple random selection of individuals from a population with the aim of estimating the prevalence of an event. This is due to two reasons: the list of all individuals does not exist and the high cost of sampling.
For example: 300 individuals must be randomly selected from a population of 200,000 cattle distributed in 1,500 farms. In order to take a simple random selection a list of the 200,000 animals would be required. Generally these lists do not exist. In addition, if all 200,000 animals could be identified it would be necessary to visit a large number of farms. The cost of visiting farms is normally high, both in money and time. In the case in which the list of farms and an estimate of the number of individuals on each farm is available it is possible to carry out two stage sampling. Using a two stage approach ProMESA calculates the number of farms (or clusters) that must be selected, dependent on the expected prevalence, the acceptable error, the level of confidence required, the number of individuals to be sampled per farm and the rate of homogeneity. The number of individuals to include in the sample per farm is determined according to operative factors. Initially an approximate value may be estimated and then the programme run to obtain a first result. After doing that, the number may be adjusted and the process repeated until a convenient result is found.
The rate of homogeneity is a parameter difficult to estimate. Its actual value may be calculated from data belonging to a previous sampling of similar characteristics of the one being planned. A table of rho estimates for different disease conditions is provided below. The procedure for the exact calculation is shown as well. The following points must be kept in mind:
The formula for calculating the required number of clusters (n) is as follows:
Where:
n | The number of clusters that have to be selected. |
p | The expected prevalence. |
D | The design effect. |
z | The critical value obtained from a standard normal distribution. For each level of confidence there is a corresponding value of z. The levels of confidence frequently used in biological studies are 90%, 95%, and 99%. The corresponding z values are 1.64, 1.96, and 2.58 respectively. |
e | The acceptable absolute error. |
b | The number of individuals to select per cluster. |
Expected prevalence | The assumed prevalence of the event in the population under study (usually based on previous studies, field data or the literature). When no information is available a value of 0.50 will yield the maximum sample size. Acceptable values: ≥ 0 and ≤ 1. |
Acceptible relative error | A measure of the desired precision. For example, if you assume a prevalence of 0.40 and a relative error of 0.10, the result will have a precision of ± 0.04 (that is, 0.40 × 0.10). In this case 0.04 is the absolute error. In general, the relative error should be ≤ 0.20. Acceptable values: ≥ 0 and ≤ 1. |
Level of confidence | The confidence that the user wants to have in the results. Acceptable values: 90%, 95% or 99%. |
Number of samples per cluster | The number of samples to be taken per cluster (farm or village) depends on the analysis of operative factors and available resources. It is convenient to be ≥ 5. The number of clusters to be selected will be determined as function of this parameter, among others. There is a kind of compensation between both values: the smaller the number of samples to take per cluster, the greater the number of clusters to be selected. Acceptable values: any integer number. |
Disease | Prevalence | Number clusters | Design effect | Rho | |
% | n | ||||
Enzootic bovine leucosis | 1.51 | 2907 | 104 | 3.52 | 0.09 |
Enzootic bovine leucosis | 11.75 | 945 | 81 | 2.11 | 0.10 |
Enzootic bovine leucosis | 1.93 | 466 | 90 | 1.34 | 0.08 |
Infectious bovine rhinotracheitis | 31.97 | 2852 | 104 | 2.76 | 0.07 |
Infectious bovine rhinotracheitis | 47.88 | 969 | 82 | 1.71 | 0.07 |
Infectious bovine rhinotracheitis | 28.11 | 466 | 90 | 2.62 | 0.39 |
Bovine virus diarrhoea | 6.30 | 2799 | 108 | 6.95 | 0.23 |
Bovine virus diarrhoea | 19.07 | 970 | 82 | 5.74 | 0.42 |
Bovine virus diarrhoea | 69.74 | 466 | 90 | 2.76 | 0.42 |
Newcastle disease | 37.89 | 1470 | 253 | 1.89 | 0.18 |
Infectious bursal disease | 41.56 | 1470 | 253 | 2.56 | 0.37 |
Leptospira hardjo | 38.55 | 2861 | 104 | 2.54 | 0.06 |
Leptospira icterohaemorrhagica | 13.60 | 2861 | 104 | 4.24 | 0.12 |
Leptospira grippotyphosa | 16.57 | 2861 | 104 | 3.91 | 0.11 |
Leptospira canicola | 5.38 | 2861 | 104 | 3.04 | 0.08 |
Brucella abortus | 7.74 | 1512 | 104 | 2.18 | 0.09 |
Brucella ovis | 11.71 | 1529 | 40 | 6.94 | 0.16 |
Brucella ovis | 10.99 | 1529 | 40 | 9.20 | 0.22 |
Anaplasma marginale | 3.78 | 2909 | 104 | 2.19 | 0.04 |
Anaplasma marginale | 4.32 | 1111 | 91 | 2.11 | 0.10 |
Trypanosoma vivax | 2.75 | 2909 | 104 | 2.56 | 0.06 |
Trypanosoma vivax | 30.87 | 1111 | 91 | 2.68 | 0.15 |
Trypanosoma congolense | 23.94 | 1111 | 91 | 2.51 | 0.13 |
Trypanosoma brucei | 24.39 | 1111 | 91 | 2.39 | 0.12 |
Eimeria spp. | 27.13 | 1010 | 104 | 2.53 | 0.18 |
Eimeria spp. | 15.63 | 1113 | 91 | 4.32 | 0.30 |
Strongyloides spp. | 12.08 | 1010 | 104 | 2.35 | 0.16 |
Strongyloides spp. | 4.04 | 1113 | 91 | 2.27 | 0.11 |
Trychostrongylus spp. | 69.01 | 1010 | 104 | 1.70 | 0.08 |
Trychostrongylus spp. | 48.43 | 1113 | 91 | 2.22 | 0.10 |
Moniezia spp. | 3.07 | 1010 | 104 | 1.46 | 0.05 |
Moniezia spp. | 15.90 | 1113 | 91 | 3.21 | 0.20 |
Fasciola spp. | 6.92 | 1113 | 91 | 4.09 | 0.27 |
Instituto Nacional de Tecnología Agropecuaria |
EpiCentre, IVABS, Massey University |