The estimated prevalence of an event is calculated as follows:
To determine the confidence interval of a proportion estimated from two-stage sampling, the standard error of the proportion must be firstly calculated, based on the following equation:
Where:
| c | The number of clusters included in the sample. |
| ntotal | The total number of individuals included in the sample. |
| ni | The number of individuals in the sample belonging to cluster i. |
| ei | The number of individuals with the event in the sample belonging to cluster i. |
| p | The estimated proportion of individuals with the event in the population. |
The confidence interval is calculated as follows:
Where:
| p | The estimated prevalence in the population. |
| z | The value obtained from the standard normal distribution. To each value of confidence there is a correspondent value of z. The levels of confidence more frequently used in biological studies are 90%, 95% and 99%. The values of z correspondent to them are 1.64, 1.96, and 2.58 respectively. |
The design effect (D) is the ratio of the standard error using a two-stage design to the standard error based on simple random sampling:
Where:
The rate of homogeneity (roh) is calculated as follows:
Where:
| m | The average number of individuals per cluster. |
The data to be used is introduced into the program via a one column comma separated value (*.csv) file. Each row of data should correspond to a cluster (farm or village). The data file should have three columns:
| id | The identification of the cluster (any alphanumeric chracter). |
| samples | The number of samples taken in the cluster. |
| events | The number of events detected in the cluster. |
| Level of confidence | The confidence that the user wants to have in the results. Acceptable values: 90%, 95% or 99%. |
Instituto Nacional de Tecnología Agropecuaria |
EpiCentre, IVABS, Massey University |