Estimate prevalence: confidence interval for a two stage sample

Theory

The estimated prevalence of an event is calculated as follows:

To determine the confidence interval of a proportion estimated from two-stage sampling, the standard error of the proportion must be firstly calculated, based on the following equation:

Where:
c The number of clusters included in the sample.
ntotal The total number of individuals included in the sample.
ni The number of individuals in the sample belonging to cluster i.
ei The number of individuals with the event in the sample belonging to cluster i.
p The estimated proportion of individuals with the event in the population.

The confidence interval is calculated as follows:

Where:
p The estimated prevalence in the population.
z The value obtained from the standard normal distribution. To each value of confidence there is a correspondent value of z. The levels of confidence more frequently used in biological studies are 90%, 95% and 99%. The values of z correspondent to them are 1.64, 1.96, and 2.58 respectively.

The design effect (D) is the ratio of the standard error using a two-stage design to the standard error based on simple random sampling:

Where:

The rate of homogeneity (roh) is calculated as follows:

Where:
m The average number of individuals per cluster.

Practice

The data to be used is introduced into the program via a one column comma separated value (*.csv) file. Each row of data should correspond to a cluster (farm or village). The data file should have three columns:

id The identification of the cluster (any alphanumeric chracter).
samples The number of samples taken in the cluster.
events The number of events detected in the cluster.
Level of confidence The confidence that the user wants to have in the results.
Acceptable values: 90%, 95% or 99%.