A Lexicon for Public Health Students: The Design Effect

in #MedEd/Public Health by

Reblogged from my previous post on the Community Medicine Education Blog, as a part of the new series where I go about de-mystifying stuff that confuses… mainly me!

Of late, in all our Journal Clubs, design effect seems to get a lot of attention, so much so, that there has been talks of having a short session on design effect itself! Here goes my attempt to talk about this much discussed topic!



Like the British politician (and two times Prime Minister) Benjamin Disraeli, I too hate definitions, so I shall try to make this as painless as possible!

In survey based studies, almost always, a complex sampling system, like a cluster sampling or a stratified sampling, is used. In comparison to a simple random sampling where all the members of the universal set to be studied have an equal probability of getting selected, in these complex sampling frames there are, almost always, unequal probability of selection or clustering of selected samples. Thus, based on the design adopted for a study, there is a deviation of the study sample from the sample selected by simple random sampling.

Thus, the design effect is a factor which measures the “distance” or “amount” of variation of a particular study sample (selected by a defined design) from the simple random sampling of the whole population the study aims to target.

Kish (1) and Moser (2) in their work, define design effect with much greater mathematical precision. They define it as the ratio of the variance of the estimated outcome under the cluster sampling method clip_image003 to the variance of the same outcome that would be expected if the same number of individuals were selected by a simple random sampling technique clip_image005.

Simply put,


So basically, design effect is a measure of the impact caused by a deviation from the simple random sampling design

An Alternate Definition:

It can also be shown (and this definition is used more often in social studies) that:


Where n is the average cluster size and clip_image011 is the intra-class correlation coefficient of the outcome variable in question.

Intra-class Correlation Coefficient:

Now as the above equation clearly shows, the design effect depends on two factors:

– the average size of the cluster

– the intra-class correlation coefficient (ICC)

So what is this ICC?

Simply put, the members within the same cluster are more likely to be similar to each other (and hence have a high ICC) than members from different clusters.

However, not always shall a variable have similar trends within the cluster. Let us consider an example. In our journal club discussion on hypertension (3) a two stage cluster sampling was employed. In the whole island of Car Nicobar, there were 308 tuhets (aka extended joint families). In the first stage, 40 of these were selected randomly, and in the second stage, all the members aged >18 years in every tuhet were selected. By doing this, they recruited almost 1000 subjects. Now, it is clear, that in simple random sampling, these 1000 subjects would have been picked from all 308 tuhets, thereby giving more variability in the results, but in the first stage, selecting those 40 tuhets made life easier for the investigators, but eroded statistical power of the study due to the deviation, which we defined before, as the design effect.

Now the members within the same tuhets would have similar food habits, salt intake and certain other risk factors for which the ICC would have been high. If a dot plot was made, it would appear something like this, where the members of the same cluster show clustering of the variables (source: Wikipedia):


On the other hand, some other features, like physical activity or disability rates would not show such clustering. Plotting these factors on the dot plot charts would reveal something like this, with low ICC (source: Wikipedia):


Hence, the question that naturally arises next is: how to estimate the ICC?

Determining the Intraclass Correlation Coefficient:

Before we embark on the explanation of how to derive the ICC, I would like to point out that in a cluster sampling, there are two levels where variability is introduced, unlike in a simple random sampling, where there is only one level of variability.

In an SRS design, the only measure of variability of the subjects is at the individual level. However, in the cluster sampling design, the variability exists between the clusters (between every tuhet) and also within the clusters (the individual level).

Using this conceptual model, the mathematical formula used to define the ICC is:


Clearly, from this equation, we can see that the value of ICC ranges between 0 to 1. In the extreme case, where the ICC approaches 1, the variance within the cluster will have to approach 0, which means that almost all the members within the cluster shall provide a similar response. That means, the effective sample size (the significance of the effective sample size is discussed later) would be reduced to the number of clusters (and eventually result in a very high design effect).

Conversely, a very low value for the ICC would mean that the variance between the clusters is much lesser than the variance within the clusters.

And finally, the extreme case where the ICC is 0, implies that there is no correlation of responses within the members of the same cluster! This would effectively mean a design effect of 1.

In social studies the value of ICC ranges between 0.01 and 0.02 (4) but it is more advisable to actually calculate the ICC using a pilot study. More and more studies nowadays are reporting post-facto ICC (like the paper we discussed for road traffic accidents in last week’s journal club (5) which calculated the Deff for the various factors and showed that they ranged from 1.2 to 2.24 for various outcome factors, although they had taken 1.5 overall – hence, the study was underpowered for some variables, while for others it was adequately powered), but few studies publish the piloted values of ICC (6) (probably because they are not calculated).

Note: There are extensive equations to discuss the derivation of the variance between groups but I am not going into them mainly because I did not understand them. I am still a newbie at this stuff so I hope you shall forgive some of these omissions. If however, you are interested in them, do drop me a line and I shall mail the papers discussing the derivation. You will have to make me understand them thereafter!


The Effective Sample Size (ESS):

So, it follows from the above discussion, that cluster sampling is a weaker sampling method than the simple random sampling. So, although one may feel that they have recruited adequate number of subjects for the study (calculating sample size using standard formulae), the end result is that they have, effectively, recruited much lesser than the number they originally aimed for.

So, if there are m members in each cluster, and there are a total of k clusters, the actual study sample size is “mk”. However, taking into consideration the design effect thanks to cluster sampling (Deff), the effective sample size (ESS), is much lesser and it is given by the simple equation:


So, the smaller the design effect, the larger the effective sample size!

Design Factor:

Coming to the last and slightly confusing part of this discussion about design effect – the design factor. Unfortunately, in our undergrad days we do not really get a very good grounding in Biostatistics, so my concepts are a little flimsy. I shall try to explain this matter as best as I understand it. If you find any errors in this, please do not hesitate to point them out to me.

The design factor (Deft) is simply the square root of the design effect. So, if design effect is 4, the Deft is 2.

But what does that mean?

Now as we have discussed, design effect tells us how much larger a sample should be to nullify the effect of deviating from the simple random sampling. Design factor, on the other hand, tells us, how much larger the standard error (and hence the confidence intervals) should be in order to approximate the results that would have come from a simple random sampling.

Now a new question arises: how to interpret these results?

If Deff is 4, then the sample size that would be required for an adequately powerful study would be 4 times that calculated by the standard formulae. And since Deft would be 2, it would mean that the confidence intervals of the cluster sampled study should be twice as large in order to approach the results that would be obtained from simple random sampling.

So, in a way, Deft is also a measure of the variance of the clusters.

A design factor of 1 would mean that the effect of clustering of the study subjects on the precision of the results obtained is negligible and despite a cluster sample, the study approximates the results that would have been obtained from a simple random sampling. Design effect would also be 1 in this case, and hence, would mean an ICC of 0 (as discussed previously)!

A design factor greater than 1 would mean that the observed results from the cluster sampling have standard errors greater than what would have been obtained from a simple random sampling. Hence, if tests of significance are applied without adjusting for this, they would falsely report non-significant results to be significant, thus giving rise to Type I error or alpha error (null hypothesis is true, but erroneously rejected).

A design factor less than 1 would mean that the observed results from the cluster sampling have standard error lesser than what would have been obtained from a simple random sampling. Hence, if tests of significance are applied without adjusting for this, they would falsely indicate significant results to be non-significant, thus giving rise to Type II or beta error (null hypothesis is false, but erroneously accepted).

Once again, like ICC, there are only few variables that have a design factor that regularly exceeds 1 by a wide margin. This would indicate a large amount of homogeneity within the cluster, hence giving rise to a bigger design factor (and design effect). A classic example might be religion or ethnicity/race when households (like tuhets) are considered to be the clusters.


1. Kish L. Survey sampling. London: Wiley,1965:148–81.

2. Moser CA, Kalton G. Survey methods in social investigation. Aldershot: Dartmouth Publishing, 1993:61–78.

3. Manimunda SP, Sugunan AP, Benegal V, Balakrishna N, Rao MV, Pesala KS. Association of hypertension with risk factors & hypertension related behaviour among the aboriginal Nicobarese tribe living in Car Nicobar Island, India. Indian J Med Res. 2011 Mar;133:287-93. PubMed PMID: 21441682; PubMed Central PMCID: PMC3103153.

4. Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. American ed. New York, NY: Oxford University Press; 2000:9,112-113.

5. Dandona R, Kumar GA, Ameer MA, Ahmed GM, Dandona L. Incidence and burden of road traffic injuries in urban India. Inj Prev. 2008 Dec;14(6):354-9. PubMed PMID: 19074239; PubMed Central PMCID: PMC2777413.

6. Killip S, Mahfoud Z, Pearce K. What is an intracluster correlation coefficient? Crucial concepts for primary care researchers. Ann Fam Med. 2004 May-Jun;2(3):204-8. PubMed PMID: 15209195; PubMed Central PMCID: PMC1466680.

Skeptic Oslerphile, Scientist at the Indian Council of Medical Research, National Institute of Cholera and Enteric Diseases. Interests include: Emerging Infections, Public Health, Antimicrobial Resistance, One Health and Zoonoses, Diarrheal Diseases, Medical Education, Medical History, Open Access, Healthcare Social Media and Health2.0. Opinions are my own!

Leave a Reply

Latest from #MedEd

Go to Top