Compare two prevalences: sample size

Theory

Using this procedure the number of animals to be included in representative samples of two populations is determined in order to verify that the estimated prevalences differ significantly from each other. When comparing prevalence one might want to: (1) identify differences between both prevalences, considering just its value and not its direction (p1 <> p2); and (2) identify differences between both prevalences, considering its value and direction (p1 > p2 or p1 < p2).

Example: a new therapy has been developed to treat a condition. The new therapy is more expensive but apparently more effective than conventional therapy. Its use would be justified only if the prevalence of the disease is reduced by > 10% compared with that observed in the population treated with conventional therapy. If a clinical trial were to be carried out one would be interested to detect differences of 0% or more, assuming that the prevalence in the population treated with the new therapy can be the same or less than in the population treated conventionally.

If the samples of both populations were composed of few animals it could happen that the observed difference in prevalence were not statistically significant. On the other hand, if the sample was too large, differences of no clinical importance would be considered as significant.

The result will depend at an important point on the value of the difference stated as threshold (the bigger the difference, the smaller the required sample size) and also on the direction of that difference (for one direction a smaller sample size is required than for two), the level of confidence, and the power stated. The sample size calculation (n) is done in two stages, defining initially n' and adjusting n' as follows:

Where:
zalpha The value obtained from the standard normal distribution. To each value of confidence there is a corresponding value of z. The value of z corresponding to the level of confidence 90% is 1.645 when it doesn’t matter the direction of the difference, and 1.282 when the difference matters in one way. The value of 1.96 and 1.645 correspond to the confidence of 95% and the Z values corresponding to the level of 99% are 2.58 and 2.326.
zbeta The value obtained from standard normal distribution related to the chosen level of power. For the powers of 80%, 85%, 90% and 95% there is a corresponding value of 0.84, 1.04, 1.28, and 1.64 respectively.
P Ths is the mean of both expected prevalence: (p1 + p1) / 2.
p1 and p2 The expected prevalence in size 1 and 2.
n The required number of animals for each sample.

Practice

Expected prevalence 1 The assumed level of prevalence that the event may have in sample 1.
Acceptable values: ≥ 0 and ≤ 1.
Expected prevalence 2 The assumed level of prevalence that the event may have in sample 2.
Acceptable values: ≥ 0 and ≤ 1.
Level of confidence The confidence that the user wants to have in the results.
Acceptable values: 90%, 95% or 99%.
Power This indicates the probability of not making a Type II error. A Type II error occurs when you accept the null hypothesis when it is false. In this case it would be concluding that there are no significant differences between the two samples when actualy they are different. An 80% of power is advised.
Acceptable values: 80%, 85%, 90% or 95%.