Sample size: stratified random sample

Theory

In order to estimate the prevalence of an event in a population it is necessary to take a representative sample of individuals from that population. Individuals can be selected for sampling using different methods. When it is known that the distribution of the event is affected by a given variable it is convenient to stratify the population. For example, the proportion of individuals presenting antibodies against an agent is greater in animals of a given breed, or is affected by the age, the geographical region or the type of production system. To stratify means to divide the population under study in groups or stratum before selecting the individuals to include in the sample. Once the population is stratified the individuals are selected from the defined stratum. Stratification produces more precise estimates of prevalence. The equation for calculation of sample size (n) is as follows:

Where:
e The number of strata.
ni The number of individuals in strata i.
pi The expected prevalence in strata i.
N The total number of individuals in the population.
AE The acceptable absolute error.
z The value obtained from the standard normal distribution. To each value of confidence there is a correspondent value of z. The levels of confidence more frequently used in biological studies are 90%, 95% and 99%. The values of z correspondent to them are 1.645, 1.96, and 2.58 respectively.
wi A weighting factor of each strata, calculated as follows:

Practice

Level of confidence The confidence that the user wants to have in the results.
Acceptable values: 90%, 95% or 99%.
Acceptible relative error A measure of the desired precision. For example, if you assume a prevalence of 0.40 and a relative error of 0.10, the result will have a precision of ± 0.04 (that is, 0.40 × 0.10). In this case 0.04 is the absolute error. In general, the relative error should be ≤ 0.20.
Acceptable values: ≥ 0 and ≤ 1.
Stratum The sub-group that forms part of the population under study.
Number of individuals The number of individuals in each strata.
Acceptable values: any positive integer.
Expected prevalence The assumed level of prevalence that the event may have in each strata.
Acceptable values: ≥ 0 and ≤ 1.