• En español – ExME
  • Em português – EME

Case-control and Cohort studies: A brief overview

Posted on 6th December 2017 by Saul Crandon

Man in suit with binoculars

Introduction

Case-control and cohort studies are observational studies that lie near the middle of the hierarchy of evidence . These types of studies, along with randomised controlled trials, constitute analytical studies, whereas case reports and case series define descriptive studies (1). Although these studies are not ranked as highly as randomised controlled trials, they can provide strong evidence if designed appropriately.

Case-control studies

Case-control studies are retrospective. They clearly define two groups at the start: one with the outcome/disease and one without the outcome/disease. They look back to assess whether there is a statistically significant difference in the rates of exposure to a defined risk factor between the groups. See Figure 1 for a pictorial representation of a case-control study design. This can suggest associations between the risk factor and development of the disease in question, although no definitive causality can be drawn. The main outcome measure in case-control studies is odds ratio (OR) .

case control and cohort study methods

Figure 1. Case-control study design.

Cases should be selected based on objective inclusion and exclusion criteria from a reliable source such as a disease registry. An inherent issue with selecting cases is that a certain proportion of those with the disease would not have a formal diagnosis, may not present for medical care, may be misdiagnosed or may have died before getting a diagnosis. Regardless of how the cases are selected, they should be representative of the broader disease population that you are investigating to ensure generalisability.

Case-control studies should include two groups that are identical EXCEPT for their outcome / disease status.

As such, controls should also be selected carefully. It is possible to match controls to the cases selected on the basis of various factors (e.g. age, sex) to ensure these do not confound the study results. It may even increase statistical power and study precision by choosing up to three or four controls per case (2).

Case-controls can provide fast results and they are cheaper to perform than most other studies. The fact that the analysis is retrospective, allows rare diseases or diseases with long latency periods to be investigated. Furthermore, you can assess multiple exposures to get a better understanding of possible risk factors for the defined outcome / disease.

Nevertheless, as case-controls are retrospective, they are more prone to bias. One of the main examples is recall bias. Often case-control studies require the participants to self-report their exposure to a certain factor. Recall bias is the systematic difference in how the two groups may recall past events e.g. in a study investigating stillbirth, a mother who experienced this may recall the possible contributing factors a lot more vividly than a mother who had a healthy birth.

A summary of the pros and cons of case-control studies are provided in Table 1.

case control and cohort study methods

Table 1. Advantages and disadvantages of case-control studies.

Cohort studies

Cohort studies can be retrospective or prospective. Retrospective cohort studies are NOT the same as case-control studies.

In retrospective cohort studies, the exposure and outcomes have already happened. They are usually conducted on data that already exists (from prospective studies) and the exposures are defined before looking at the existing outcome data to see whether exposure to a risk factor is associated with a statistically significant difference in the outcome development rate.

Prospective cohort studies are more common. People are recruited into cohort studies regardless of their exposure or outcome status. This is one of their important strengths. People are often recruited because of their geographical area or occupation, for example, and researchers can then measure and analyse a range of exposures and outcomes.

The study then follows these participants for a defined period to assess the proportion that develop the outcome/disease of interest. See Figure 2 for a pictorial representation of a cohort study design. Therefore, cohort studies are good for assessing prognosis, risk factors and harm. The outcome measure in cohort studies is usually a risk ratio / relative risk (RR).

case control and cohort study methods

Figure 2. Cohort study design.

Cohort studies should include two groups that are identical EXCEPT for their exposure status.

As a result, both exposed and unexposed groups should be recruited from the same source population. Another important consideration is attrition. If a significant number of participants are not followed up (lost, death, dropped out) then this may impact the validity of the study. Not only does it decrease the study’s power, but there may be attrition bias – a significant difference between the groups of those that did not complete the study.

Cohort studies can assess a range of outcomes allowing an exposure to be rigorously assessed for its impact in developing disease. Additionally, they are good for rare exposures, e.g. contact with a chemical radiation blast.

Whilst cohort studies are useful, they can be expensive and time-consuming, especially if a long follow-up period is chosen or the disease itself is rare or has a long latency.

A summary of the pros and cons of cohort studies are provided in Table 2.

case control and cohort study methods

The Strengthening of Reporting of Observational Studies in Epidemiology Statement (STROBE)

STROBE provides a checklist of important steps for conducting these types of studies, as well as acting as best-practice reporting guidelines (3). Both case-control and cohort studies are observational, with varying advantages and disadvantages. However, the most important factor to the quality of evidence these studies provide, is their methodological quality.

  • Song, J. and Chung, K. Observational Studies: Cohort and Case-Control Studies .  Plastic and Reconstructive Surgery.  2010 Dec;126(6):2234-2242.
  • Ury HK. Efficiency of case-control studies with multiple controls per case: Continuous or dichotomous data .  Biometrics . 1975 Sep;31(3):643–649.
  • von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative.  The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies.   Lancet 2007 Oct;370(9596):1453-14577. PMID: 18064739.

' src=

Saul Crandon

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

No Comments on Case-control and Cohort studies: A brief overview

' src=

Very well presented, excellent clarifications. Has put me right back into class, literally!

' src=

Very clear and informative! Thank you.

' src=

very informative article.

' src=

Thank you for the easy to understand blog in cohort studies. I want to follow a group of people with and without a disease to see what health outcomes occurs to them in future such as hospitalisations, diagnoses, procedures etc, as I have many health outcomes to consider, my questions is how to make sure these outcomes has not occurred before the “exposure disease”. As, in cohort studies we are looking at incidence (new) cases, so if an outcome have occurred before the exposure, I can leave them out of the analysis. But because I am not looking at a single outcome which can be checked easily and if happened before exposure can be left out. I have EHR data, so all the exposure and outcome have occurred. my aim is to check the rates of different health outcomes between the exposed)dementia) and unexposed(non-dementia) individuals.

' src=

Very helpful information

' src=

Thanks for making this subject student friendly and easier to understand. A great help.

' src=

Thanks a lot. It really helped me to understand the topic. I am taking epidemiology class this winter, and your paper really saved me.

Happy new year.

' src=

Wow its amazing n simple way of briefing ,which i was enjoyed to learn this.its very easy n quick to pick ideas .. Thanks n stay connected

' src=

Saul you absolute melt! Really good work man

' src=

am a student of public health. This information is simple and well presented to the point. Thank you so much.

' src=

very helpful information provided here

' src=

really thanks for wonderful information because i doing my bachelor degree research by survival model

' src=

Quite informative thank you so much for the info please continue posting. An mph student with Africa university Zimbabwe.

' src=

Thank you this was so helpful amazing

' src=

Apreciated the information provided above.

' src=

So clear and perfect. The language is simple and superb.I am recommending this to all budding epidemiology students. Thanks a lot.

' src=

Great to hear, thank you AJ!

' src=

I have recently completed an investigational study where evidence of phlebitis was determined in a control cohort by data mining from electronic medical records. We then introduced an intervention in an attempt to reduce incidence of phlebitis in a second cohort. Again, results were determined by data mining. This was an expedited study, so there subjects were enrolled in a specific cohort based on date(s) of the drug infused. How do I define this study? Thanks so much.

' src=

thanks for the information and knowledge about observational studies. am a masters student in public health/epidemilogy of the faculty of medicines and pharmaceutical sciences , University of Dschang. this information is very explicit and straight to the point

' src=

Very much helpful

Subscribe to our newsletter

You will receive our monthly newsletter and free access to Trip Premium.

Related Articles

""

Cluster Randomized Trials: Concepts

This blog summarizes the concepts of cluster randomization, and the logistical and statistical considerations while designing a cluster randomized controlled trial.

""

Expertise-based Randomized Controlled Trials

This blog summarizes the concepts of Expertise-based randomized controlled trials with a focus on the advantages and challenges associated with this type of study.

""

An introduction to different types of study design

Conducting successful research requires choosing the appropriate study design. This article describes the most common types of designs conducted by researchers.

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 20, Issue 1
  • Observational research methods. Research design II: cohort, cross sectional, and case-control studies
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Department of Accident and Emergency Medicine, Taunton and Somerset Hospital, Taunton, Somerset, UK
  • Correspondence to:
 Dr C J Mann; 
 tonygood{at}doctors.org.uk

Cohort, cross sectional, and case-control studies are collectively referred to as observational studies. Often these studies are the only practicable method of studying various problems, for example, studies of aetiology, instances where a randomised controlled trial might be unethical, or if the condition to be studied is rare. Cohort studies are used to study incidence, causes, and prognosis. Because they measure events in chronological order they can be used to distinguish between cause and effect. Cross sectional studies are used to determine prevalence. They are relatively quick and easy but do not permit distinction between cause and effect. Case controlled studies compare groups retrospectively. They seek to identify possible predictors of outcome and are useful for studying rare diseases or outcomes. They are often used to generate hypotheses that can then be studied via prospective cohort or other studies.

  • research methods
  • cohort study
  • case-control study
  • cross sectional study

http://dx.doi.org/10.1136/emj.20.1.54

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Cohort, cross sectional, and case-control studies are often referred to as observational studies because the investigator simply observes. No interventions are carried out by the investigator. With the recent emphasis on evidence based medicine and the formation of the Cochrane Database of randomised controlled trials, such studies have been somewhat glibly maligned. However, they remain important because many questions can be efficiently answered by these methods and sometimes they are the only methods available.

The objective of most clinical studies is to determine one of the following—prevalence, incidence, cause, prognosis, or effect of treatment; it is therefore useful to remember which type of study is most commonly associated with each objective (table 1)

  • View inline

While an appropriate choice of study design is vital, it is not sufficient. The hallmark of good research is the rigor with which it is conducted. A checklist of the key points in any study irrespective of the basic design is given in box 1.

Study purpose

The aim of the study should be clearly stated.

The sample should accurately reflect the population from which it is drawn.

The source of the sample should be stated.

The sampling method should be described and the sample size should be justified.

Entry criteria and exclusions should be stated and justified.

The number of patients lost to follow up should be stated and explanations given.

Control group

The control group should be easily identifiable.

The source of the controls should be explained—are they from the same population as the sample?

Are the controls matched or randomised—to minimise bias and confounding.

Quality of measurements and outcomes

Validity—are the measurements used regarded as valid by other investigators?

Reproducibility—can the results be repeated or is there a reason to suspect they may be a “one off”?

Blinded—were the investigators or subjects aware of their subject/control allocation?

Quality control—has the methodology been rigorously adhered to?

Completeness

Compliance—did all patients comply with the study?

Drop outs—how many failed to complete the study?

Missing data—how much are unavailable and why?

Distorting influences

Extraneous treatments—other interventions that may have affected some but not all of the subjects.

Confounding factors—Are there other variables that might influence the results?

Appropriate analysis—Have appropriate statistical tests been used?

All studies should be internally valid. That is, the conclusions can be logically drawn from the results produced by an appropriate methodology. For a study to be regarded as valid it must be shown that it has indeed demonstrated what it says it has. A study that is not internally valid should not be published because the findings cannot be accepted.

The question of external validity relates to the value of the results of the study to other populations—that is, the generalisability of the results. For example, a study showing that 80% of the Swedish population has blond hair, might be used to make a sensible prediction of the incidence of blond hair in other Scandinavian countries, but would be invalid if applied to most other populations.

Every published study should contain sufficient information to allow the reader to analyse the data with reference to these key points.

In this article each of the three important observational research methods will be discussed with emphasis on their strengths and weaknesses. In so doing it should become apparent why a given study used a particular research method and which method might best answer a particular clinical problem.

COHORT STUDIES

These are the best method for determining the incidence and natural history of a condition. The studies may be prospective or retrospective and sometimes two cohorts are compared.

Prospective cohort studies

A group of people is chosen who do not have the outcome of interest (for example, myocardial infarction). The investigator then measures a variety of variables that might be relevant to the development of the condition. Over a period of time the people in the sample are observed to see whether they develop the outcome of interest (that is, myocardial infarction).

In single cohort studies those people who do not develop the outcome of interest are used as internal controls.

Where two cohorts are used, one group has been exposed to or treated with the agent of interest and the other has not, thereby acting as an external control.

Retrospective cohort studies

These use data already collected for other purposes. The methodology is the same but the study is performed posthoc. The cohort is “followed up” retrospectively. The study period may be many years but the time to complete the study is only as long as it takes to collate and analyse the data.

Advantages and disadvantages

The use of cohorts is often mandatory as a randomised controlled trial may be unethical; for example, you cannot deliberately expose people to cigarette smoke or asbestos. Thus research on risk factors relies heavily on cohort studies.

As cohort studies measure potential causes before the outcome has occurred the study can demonstrate that these “causes” preceded the outcome, thereby avoiding the debate as to which is cause and which is effect.

A further advantage is that a single study can examine various outcome variables. For example, cohort studies of smokers can simultaneously look at deaths from lung, cardiovascular, and cerebrovascular disease. This contrasts with case-control studies as they assess only one outcome variable (that is, whatever outcome the cases have entered the study with).

Cohorts permit calculation of the effect of each variable on the probability of developing the outcome of interest (relative risk). However, where a certain outcome is rare then a prospective cohort study is inefficient. For example, studying 100 A&E attenders with minor injuries for the outcome of diabetes mellitus will probably produce only one patient with the outcome of interest. The efficiency of a prospective cohort study increases as the incidence of any particular outcome increases. Thus a study of patients with a diagnosis of deliberate self harm in the 12 months after initial presentation would be efficiently studied using a cohort design.

Another problem with prospective cohort studies is the loss of some subjects to follow up. This can significantly affect the outcome. Taking incidence analysis as an example (incidence = cases/per period of time), it can be seen that the loss of a few cases will seriously affect the numerator and hence the calculated incidence. The rarer the condition the more significant this effect.

Retrospective studies are much cheaper as the data have already been collected. One advantage of such a study design is the lack of bias because the outcome of current interest was not the original reason for the data to be collected. However, because the cohort was originally constructed for another purpose it is unlikely that all the relevant information will have been rigorously collected.

Retrospective cohorts also suffer the disadvantage that people with the outcome of interest are more likely to remember certain antecedents, or exaggerate or minimise what they now consider to be risk factors (recall bias).

Where two cohorts are compared one will have been exposed to the agent of interest and one will not. The major disadvantage is the inability to control for all other factors that might differ between the two groups. These factors are known as confounding variables.

A confounding variable is independently associated with both the variable of interest and the outcome of interest. For example, lung cancer (outcome) is less common in people with asthma (variable). However, it is unlikely that asthma in itself confers any protection against lung cancer. It is more probable that the incidence of lung cancer is lower in people with asthma because fewer asthmatics smoke cigarettes (confounding variable). There are a virtually infinite number of potential confounding variables that, however unlikely, could just explain the result. In the past this has been used to suggest that there is a genetic influence that makes people want to smoke and also predisposes them to cancer.

The only way to eliminate all possibility of a confounding variable is via a prospective randomised controlled study. In this type of study each type of exposure is assigned by chance and so confounding variables should be present in equal numbers in both groups.

Finally, problems can arise as a result of bias. Bias can occur in any research and reflects the potential that the sample studied is not representative of the population it was drawn from and/or the population at large. A classic example is using employed people, as employment is itself associated with generally better health than unemployed people. Similarly people who respond to questionnaires tend to be fitter and more motivated than those who do not. People attending A&E departments should not be presumed to be representative of the population at large.

How to run a cohort study

If the data are readily available then a retrospective design is the quickest method. If high quality, reliable data are not available a prospective study will be required.

The first step is the definition of the sample group. Each subject must have the potential to develop the outcome of interest (that is, circumcised men should not be included in a cohort designed to study paraphimosis). Furthermore, the sample population must be representative of the general population if the study is primarily looking at the incidence and natural history of the condition (descriptive).

If however the aim is to analyse the relation between predictor variables and outcomes (analytical) then the sample should contain as many patients likely to develop the outcome as possible, otherwise much time and expense will be spent collecting information of little value.

Cohort studies

Cohort studies describe incidence or natural history.

They analyse predictors (risk factors) thereby enabling calculation of relative risk.

Cohort studies measure events in temporal sequence thereby distinguishing causes from effects.

Retrospective cohorts where available are cheaper and quicker.

Confounding variables are the major problem in analysing cohort studies.

Subject selection and loss to follow up is a major potential cause of bias.

Each variable studied must be accurately measured. Variables that are relatively fixed, for example, height need only be recorded once. Where change is more probable, for example, drug misuse or weight, repeated measurements will be required.

To minimise the potential for missing a confounding variable all probable relevant variables should be measured. If this is not done the study conclusions can be readily criticised. All patients entered into the study should also be followed up for the duration of the study. Losses can significantly affect the validity of the results. To minimise this as much information about the patient (name, address, telephone, GP, etc) needs to be recorded as soon as the patient is entered into the study. Regular contact should be made; it is hardly surprising if the subjects have moved or lost interest and become lost to follow up if they are only contacted at 10 year intervals!

Beware, follow up is usually easier in people who have been exposed to the agent of interest and this may lead to bias.

There are many famous examples of Cohort studies including the Framingham heart study, 2 the UK study of doctors who smoke 3 and Professor Neville Butler‘s studies on British children born in 1958. 4 A recent example of a prospective cohort study by Davey Smith et al was published in the BMJ 5 and a retrospective cohort design was used to assess the use of A&E departments by people with diabetes. 6

CROSS SECTIONAL STUDIES

These are primarily used to determine prevalence. Prevalence equals the number of cases in a population at a given point in time. All the measurements on each person are made at one point in time. Prevalence is vitally important to the clinician because it influences considerably the likelihood of any particular diagnosis and the predictive value of any investigation. For example, knowing that ascending cholangitis in children is very rare enables the clinician to look for other causes of abdominal pain in this patient population.

Cross sectional studies are also used to infer causation.

At one point in time the subjects are assessed to determine whether they were exposed to the relevant agent and whether they have the outcome of interest. Some of the subjects will not have been exposed nor have the outcome of interest. This clearly distinguishes this type of study from the other observational studies (cohort and case controlled) where reference to either exposure and/or outcome is made.

The advantage of such studies is that subjects are neither deliberately exposed, treated, or not treated and hence there are seldom ethical difficulties. Only one group is used, data are collected only once and multiple outcomes can be studied; thus this type of study is relatively cheap.

Many cross sectional studies are done using questionnaires. Alternatively each of the subjects may be interviewed. Table 2 lists the advantages and disadvantages of each.

Any study with a low response rate can be criticised because it can miss significant differences in the responders and non-responders. At its most extreme all the non-responders could be dead! Strenuous efforts must be made to maximise the numbers who do respond. The use of volunteers is also problematic because they too are unlikely to be representative of the general population. A good way to produce a valid sample would be to randomly select people from the electoral role and invite them to complete a questionnaire. In this way the response rate is known and non-responders can be identified. However, the electoral role itself is not an entirely accurate reflection of the general population. A census is another example of a cross sectional study.

Market research organisations often use cross sectional studies (for example, opinion polls). This entails a system of quotas to ensure the sample is representative of the age, sex, and social class structure of the population being studied. However, to be commercially viable they are convenience samples—only people available can be questioned. This technique is insufficiently rigorous to be used for medical research.

How to run a cross sectional study

Formulate the research question(s) and choose the sample population. Then decide what variables of the study population are relevant to the research question. A method for contacting sample subjects must be devised and then implemented. In this way the data are collected and can then be analysed

The most important advantage of cross sectional studies is that in general they are quick and cheap. As there is no follow up, less resources are required to run the study.

Cross sectional studies are the best way to determine prevalence and are useful at identifying associations that can then be more rigorously studied using a cohort study or randomised controlled study.

The most important problem with this type of study is differentiating cause and effect from simple association. For example, a study finding an association between low CD4 counts and HIV infection does not demonstrate whether HIV infection lowers CD4 levels or low CD4 levels predispose to HIV infection. Moreover, male homosexuality is associated with both but causes neither. (Another example of a confounding variable).

Often there are a number of plausible explanations. For example, if a study shows a negative relation between height and age it could be concluded that people lose height as they get older, younger generations are getting taller, or that tall people have a reduced life expectancy when compared with short people. Cross sectional studies do not provide an explanation for their findings.

Rare conditions cannot efficiently be studied using cross sectional studies because even in large samples there may be no one with the disease. In this situation it is better to study a cross sectional sample of patients who already have the disease (a case series). In this way it was found in 1983 that of 1000 patients with AIDS, 727 were homosexual or bisexual men and 236 were intrvenous drug abusers. 6 The conclusion that individuals in these two groups had a higher relative risk was inescapable. The natural history of HIV infection was then studied using cohort studies and efficacy of treatments via case controlled studies and randomised clinical trials.

An example of a cross sectional study was the prevalence study of skull fractures in children admitted to hospital in Edinburgh from 1983 to 1989. 7 Note that although the study period was seven years it was not a longitudinal or cohort study because information about each subject was recorded at a single point in time.

A questionnaire based cross sectional study explored the relation between A&E attendance and alcohol consumption in elderly persons. 9

A recent example can be found in the BMJ , in which the prevalence of serious eye disease in a London population was evaluated. 10

Cross sectional studies

Cross sectional studies are the best way to determine prevalence

Are relatively quick

Can study multiple outcomes

Do not themselves differentiate between cause and effect or the sequence of events

CASE-CONTROL STUDIES

In contrast with cohort and cross sectional studies, case-control studies are usually retrospective. People with the outcome of interest are matched with a control group who do not. Retrospectively the researcher determines which individuals were exposed to the agent or treatment or the prevalence of a variable in each of the study groups. Where the outcome is rare, case-control studies may be the only feasible approach.

As some of the subjects have been deliberately chosen because they have the disease in question case-control studies are much more cost efficient than cohort and cross sectional studies—that is, a higher percentage of cases per study.

Case-control studies determine the relative importance of a predictor variable in relation to the presence or absence of the disease. Case-control studies are retrospective and cannot therefore be used to calculate the relative risk; this a prospective cohort study. Case-control studies can however be used to calculate odds ratios, which in turn, usually approximate to the relative risk.

How to run a case-control study

Decide on the research question to be answered. Formulate an hypothesis and then decide what will be measured and how. Specify the characteristics of the study group and decide how to construct a valid control group. Then compare the “exposure” of the two groups to each variable.

When conditions are uncommon, case-control studies generate a lot of information from relatively few subjects. When there is a long latent period between an exposure and the disease, case-control studies are the only feasible option. Consider the practicalities of a cohort study or cross sectional study in the assessment of new variant CJD and possible aetiologies. With less than 300 confirmed cases a cross sectional study would need about 200 000 subjects to include one symptomatic patient. Given a postulated latency of 10 to 30 years a cohort study would require both a vast sample size and take a generation to complete.

In case-control studies comparatively few subjects are required so more resources are available for studying each. In consequence a huge number of variables can be considered. This type of study is therefore useful for generating hypotheses that can then be tested using other types of study.

This flexibility of the variables studied comes at the expense of the restricted outcomes studied. The only outcome is the presence or absence of the disease or whatever criteria was chosen to select the cases.

The major problems with case-control studies are the familiar ones of confounding variables (see above) and bias. Bias may take two major forms.

Sampling bias

The patients with the disease may be a biased sample (for example, patients referred to a teaching hospital) or the controls may be biased (for example, volunteers, different ages, sex or socioeconomic group).

Observation and recall bias

As the study assesses predictor variables retrospectively there is great potential for a biased assessment of their presence and significance by the patient or the investigator, or both.

Overcoming sampling bias

Ideally the cases studied should be a random sample of all the patients with the disease. This is not only very difficult but in many instances is impossible because many cases may not have been diagnosed or have been misdiagnosed. For example, many cases of non-insulin dependent diabetes will not have sought medical attention and therefore be undiagnosed. Conversely many psychiatric diseases may be differently labelled in different countries and even by different doctors in the same country. As a result they will be misdiagnosed for the purposes of the study. However, in reality you are often left studying a sample of those patients who it is possible to recruit. Selecting the controls is often a more difficult problem.

To enable the controls to represent the same population as the cases, one of four techniques may be used.

A convenience sample—sampled in the same way as the cases, for example, attending the same outpatient department. While this is certainly convenient it may reduce the external validity of the study.

Matching—the controls may be a matched or unmatched random sample from the unaffected population. Again the problems of controlling for unknown influences is present but if the controls are too closely matched they may not be representative of the general population. “Over matching” may cause the true difference to be underestimated.

The advantage of matching is that it allows a smaller sample size for any given effect to be statistically significant.

Using two or more control groups. If the study demonstrates a significant difference between the patients with the outcome of interest and those without, even when the latter have been sampled in a number of different ways (for example, outpatients, in patients, GP patients) then the conclusion is more robust.

Using a population based sample for both cases and controls. It is possible to take a random sample of all the patients with a particular disease from specific registers. The control group can then be constructed by selecting age and sex matched people randomly selected from the same population as the area covered by the disease register.

Overcoming observation and recall bias

Overcoming retrospective recall bias can be achieved by using data recorded, for other purposes, before the outcome had occurred and therefore before the study had started. The success of this strategy is limited by the availability and reliability of the data collected. Another technique is blinding where neither the subject nor the observer know if they are a case or control subject. Nor are they aware of the study hypothesis. In practice this is often difficult or impossible and only partial blinding is practicable. It is usually possible to blind the subjects and observers to the study hypothesis by asking spurious questions. Observers can also be easily blinded to the case or control status of the patient where the relevant observation is not of the patient themselves but a laboratory test or radiograph.

Case-control studies

Case-control studies are simple to organise

Retrospectively compare two groups

Aim to identify predictors of an outcome

Permit assessment of the influence of predictors on outcome via calculation of an odds ratio

Useful for hypothesis generation

Can only look at one outcome

Bias is an major problem

Blinding cases to their case or control status is usually impracticable as they already know that they have a disease or illness. Similarly observers can hardly be blinded to the presence of physical signs, for example, cyanosis or dyspnoea.

As a result of the problems of matching, bias and confounding, case-control studies, are often flawed. They are however useful for generating hypotheses. These hypotheses can then be tested more rigorously by other methods—randomised controlled trials or cohort studies.

Case-control studies are very common. They are particularly useful for studying infrequent events, for example, cot death, survival from out of hospital cardiac arrest, and toxicological emergencies.

A recent example was the study of atrial fibrillation in middle aged men during exercise. 11

USING DATABASES FOR RESEARCH (SECONDARY DATA)

Pre-existing databases provide an excellent and convenient source of data. There are a host of such databases and the increasing archiving of information on computers means that this is an enlarging area for obtaining data. Table 3 lists some common examples of potentially useful databases.

Such databases enable vast numbers of people to be entered into a study prospectively or retrospectively. They can be used to construct a cohort, to produce a sample for a cross sectional study, or to identify people with certain conditions or outcomes and produce a sample for a case controlled study. A recent study used census data from 11 countries to look at the relation between social class and mortality in middle aged men. 12

These type of data are ordinarily collected by people other than the researcher and independently of any specific hypothesis. The opportunity for observer bias is thus diminished. The use of previously collected data is efficient and comparatively inexpensive and moreover the data are collected in a very standardised way, permitting comparisons over time and between different countries. However, because the data are collected for other purposes it may not be ideally suited to the testing of the current hypothesis, additionally it may be incomplete. This may result in sampling bias. For example, the electoral roll depends upon registration by each individual. Many homeless, mentally ill, and chronically sick people will not be registered. Similarly the notification of certain communicable diseases is a statutory responsibility for doctors in the UK: while it is probable that most cases of cholera are reported it is highly unlikely that most cases of food poisoning are.

Causes and associations

Because observational studies are not experiments (as are randomised controlled trials) it is difficult to control many external variables. In consequence when faced with a clear and significant association between some form of illness or cause of death and some environmental influence a judgement has to be made as to whether this is a causal link or simply an association. Table 4 outlines the points to be considered when making this judgement. 13

None of these judgements can provide indisputable evidence of cause and effect, but taken together they do permit the investigator to answer the fundamental questions “is there any other way to explain the available evidence?” and is there any other more likely than cause and effect?”

Qualitative studies can produce high quality information but all such studies can be influenced by known and unknown confounding variables. Appropriate use of observational studies permits investigation of prevalence, incidence, associations, causes, and outcomes. Where there is little evidence on a subject they are cost effective ways of producing and investigating hypotheses before larger and more expensive study designs are embarked upon. In addition they are often the only realistic choice of research methodology, particularly where a randomised controlled trial would be impractical or unethical.

Cohort studies look forwards in time by following up each subject

Subjects are selected before the outcome of interest is observed

They establish the sequence of events

Numerous outcomes can be studied

They are the best way to establish the incidence of a disease

They are a good way to determine causes of diseases

The principal summary statistic of cohort studies is the relative risk ratio

If prospective, they are expensive and often take a long time for sufficient outcome events to occur to produce meaningful results

Cross sectional studies look at each subject at one point in time only

Subjects are selected without regard to the outcome of interest

Less expensive

They are the best way to determine prevalence

The principal summary statistic of cross sectional studies is the odds ratio

Weaker evidence of causality than cohort studies

Inaccurate when studying rare conditions

Case-control studies look back at what has happened to each subject

Subjects are selected specifically on the basis of the outcome of interest

Efficient (small sample sizes)

Produce odds ratios that approximate to relative risks for each variable studied

Prone to sampling bias and retrospective analysis bias

Only one outcome is studied

GLOSSARY OF TERMS

The inclusion of subjects or methods such that the results obtained are not truly representative of the population from which it is drawn

The process by which the researcher and or the subject is ignorant of which intervention or exposure has occurred.

Cochrane database

An international collaborative project collating peer reviewed prospective randomised clinical trials.

Is a component of a population identified so that one or more characteristic can be studied as it ages through time.

Confounding variable

A variable that is associated with both the exposure and outcome of interest that is not the variable being studied.

A group of people without the condition of interest, or unexposed to or not treated with the agent of interest.

False positive

A test result that suggests that the subject has a specific disease or condition when in fact the subject does not.

Is a rate and therefore is always related either explicitly or by implication to a time period. With regard to disease it can be defined as the number of new cases that develop during a specified time interval.

A period of time between exposure to an agent and the development of symptoms, signs, or other evidence of changes associated with that exposure.

The process by which each case is matched with one or more controls, which have been deliberately chosen to be as similar as the test subjects in all regards other than the variable being studied.

Observational study

A study in which no intervention is made (in contrast with an experimental study). Such studies provide estimates and examine associations of events in their natural settings without recourse to experimental intervention.

The ratio of the probability of an event occurring to the probability of non-occurrence. In a clinical setting this would be equivalent to the odds of a condition occurring in the exposed group divided by the odds of it occurring in the non-exposed group.

Is not defined by a time interval and is therefore not a rate. It may be defined as the number of cases of a disease that exist in a defined population at a specified point in time.

Randomised controlled trial

Subjects are assigned by statistically randomised methods to two or more groups. In doing so it is assumed that all variables other than the proposed intervention are evenly distributed between the groups. In this way bias is minimised.

Relative risk

This is the ratio of the probability of developing the condition if exposed to a certain variable compared with the probability if not exposed.

Response rate

The proportion of subjects who respond to either a treatment or a questionnaire.

Risk factor

A variable associated with a specific disease or outcome.

Validity—internal

The rigour with which a study has been designed and executed—that is, can the conclusion be relied upon?

Validity—external

The usefulness of the findings of a study with respect to other populations.

A value or quality that can vary between subjects and/or over time

  • Download figure
  • Open in new tab
  • Download powerpoint

Study design for cohort studies.

Study design for cross sectional studies

Study design for case-control studies.

  • Fowkes F , Fulton P. Critical appraisal of published research: introductory guidelines. BMJ 1991 ; 302 : 1136 –40.
  • ↵ Lerner DJ , Kannel WB. Patterns of coronary heart disease morbidity and mortality in the sexes: a 26 year follow-up of the Framingham population. Am Heart J 1986 ; 111 : 383 –90. OpenUrl CrossRef PubMed Web of Science
  • ↵ Doll R , Peto H. Mortality in relation to smoking. 40 years observation on female British doctors. BMJ 1989 ; 208 : 967 –73. OpenUrl
  • ↵ Alberman ED , Butler NR, Sheridan MD. Visual acuity of a national sample (1958 cohort) at 7 years. Dev Med Child Neurol 1971 ; 13 : 9 –14. OpenUrl PubMed Web of Science
  • ↵ Davey Smith G , Hart C, Blane D, et al . Adverse socioeconomic conditions in childhood and cause specific mortality: prospective observational study. BMJ 1998 ; 316 : 1631 –5. OpenUrl Abstract / FREE Full Text
  • ↵ Goyder EC , Goodacre SW, Botha JL, et al . How do individuals with diabetes use the accident and emergency department? J Accid Emerg Med 1997 ; 14 : 371 –4. OpenUrl Abstract / FREE Full Text
  • ↵ Jaffe HW , Bregman DJ, Selik RM. Acquired immune deficiency in the US: the first 1000 cases. J Inf Dis 1983 ; 148 : 339 –45. OpenUrl Abstract / FREE Full Text
  • Johnstone AJ , Zuberi SH, Scobie WH. Skull fractures in children: a population study. J Accid Emerg Med 1996 ; 13 : 386 –9. OpenUrl Abstract / FREE Full Text
  • ↵ van der Pol V , Rodgers H, Aitken P, et al . Does alcohol contribute to accident and emergency department attendance in elderly people? J Accid Emerg Med 1996 ; 13 : 258 –60. OpenUrl Abstract / FREE Full Text
  • ↵ Reidy A , Minassian DC, Vafadis G, et al . BMJ 1998 ; 316 : 1643 –7. OpenUrl Abstract / FREE Full Text
  • ↵ Karjaleinen , Kujala U, Kaprio J, et al . BMJ 1998 ; 316 : 1784 –5. OpenUrl FREE Full Text
  • ↵ Kunst A , Groenhof F, Mackenbach J. BMJ 1998 ; 316 : 1636 –42. OpenUrl Abstract / FREE Full Text
  • ↵ Hill AB , Hill ID. Bradford Hills principles of medical statistics. 12th edn. London: Edward Arnold, 1991.

Read the full text or download the PDF:

GFMER Geneva Foundation for Medical Education and Research

  • Annual reports
  • GFMER members
  • Country coordinators
  • Obstetric fistula
  • Cervical cancer
  • Emergency and surgical care
  • Picture of the week
  • Social media
  • Free medical journals
  • Medical schools

Training course in research methodology and research protocol development 2021

Reproductive health

Cohort study

Measures of disease and association

Current and historical cohort studies

Case-control study

Measure of association

Population and hospital-based case-controls studies

Confounding and bias

Advantages and disadvantages of cohort and case-control studies

Cohort studies

Advantages.

  • Allow complete information on the subject’s exposure, including quality control of data, and experience thereafter.
  • Provide a clear temporal sequence of exposure and disease.
  • Give an opportunity to study multiple outcomes related to a specific exposure.
  • Permit calculation of incidence rates (absolute risk) as well as relative risk.
  • Methodology and results are easily understood by non-epidemiologists.
  • Enable the study of relatively rare exposures.

Disadvantages.

  • Not suited for the study of rare diseases because a large number of subjects is required.
  • Not suited when the time between exposure and disease manifestation is very long, although this can be overcome in historical cohort studies.
  • Exposure patterns, for example the composition of oral contraceptives, may change during the course of the study and make the results irrelevant.
  • Maintaining high rates of follow-up can be difficult.
  • Expensive to carry out because a large number of subjects is usually required.
  • Baseline data may be sparse because the large number of subjects does not allow for long interviews.

Case-control studies

  • Permit the study of rare diseases.
  • Permit the study of diseases with long latency between exposure and manifestation.
  • Can be launched and conducted over relatively short time periods.
  • Relatively inexpensive as compared to cohort studies.
  • Can study multiple potential causes of disease.
  • Information on exposure and past history is primarily based on interview and may be subject to recall bias.
  • Validation of information on exposure is difficult, or incomplete, or even impossible.
  • By definition, concerned with one disease only.
  • Cannot usually provide information on incidence rates of disease.
  • Generally incomplete control of extraneous variables.
  • Choice of appropriate control group may be difficult.
  • Methodology may be hard to comprehend for non-epidemiologists and correct interpretation of results may be difficult.

Assessment of causality

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

6.3 - comparing & combining case-control and cohort studies, comparison of cohort and case-control studies section  , nested case-control study design section  .

This is a case-control study within a cohort study. At the beginning of the cohort study, (t0), members of the cohort are assessed for risk factors. Cases and controls are identified subsequently at time t1. The control group is selected from the risk set (cohort members who do not meet the case definition at t1.) Typically, the nested case-control study is less than 20% of the parent cohort.

Advantages of nested case-control

  • Efficient – not all members of the parent cohort require diagnostic testing
  • Flexible – allows testing of hypotheses not anticipated when the cohort was drawn (at t0)
  • Reduces selection bias – cases and controls sampled from the same population
  • Reduces information bias – risk factor exposure can be assessed with an investigator blind to case status

Disadvantages

  • Reduces power (from parent cohort) because of reduced sample size by 1/(c+1), where c = number of controls per case

Nested case-control studies can be matched, not matched, or counter-matched. Matching cases to controls according to baseline measurements of one or several confounding variables is done to control for the effect of confounding variables.

A counter-matched study, in contrast, is when we matched cases to controls who have a different baseline risk factor exposure level. The counter-matched study design is used to specifically assess the impact of this risk factor; it is especially good for assessing the potential interaction (effect modification!) of the secondary risk factor and the primary risk factor. Counter-matched controls are randomly selected from different strata of risk factor exposure levels in order to maximize variation in risk exposures among the controls. For example, in a study of the risk for bladder cancer from alcohol consumption, you might match cases to controls who smoke different amounts to see if the effect of smoking is only evident at a minimum level of exposure.

Example of a Nested Case-Control Study: Familial, psychiatric, and socioeconomic risk factors for suicide in young people: a nested case-control study . In a cohort study of risk factors for suicide, Agerbo et al. (2002), enrolled 496 young people who had committed suicide during 1981-97 in Denmark matched for sex, age, and time to 24,800 controls. Read how they matched each case to a representative random subsample of 50 people born the same year!

Case-Cohort Study Design Section  

A case-cohort study is similar to a nested case-control study in that the cases and non-cases are within a parent cohort; cases and non-cases are identified at time t1, after baseline. In a case-cohort study, the cohort members were assessed for risk factors at any time prior to t1. Non-cases are randomly selected from the parent cohort, forming a subcohort. No matching is performed.

Advantages of Case-Cohort Study:

Similar to nested case-control study design:

  • Efficient– not all members of the parent cohort require diagnostic testing
  • Flexible– allows testing hypotheses not anticipated when the cohort was drawn (t0)
  • Reduces selection bias – cases and non-cases sampled from the same population
  • Reduced information bias – risk factor exposure can be assessed with an investigator blind to case status

Other advantages, as compared to nested case-control study design:

  • The subcohort can be used to study multiple outcomes
  • Risk can be measured at any time up to t1  (e.g. elapsed time from a variable event, such as menopause, or birth)
  • Subcohort can be used to calculate person-time risk

Disadvantages of Case-Cohort Study:

As compared to nested case-control study design  –  Increased potential for information bias because subcohort may have been established after t0 exposure information collected at different times (e.g. potential for sample deterioration)

  • Journal home
  • Advance online publication
  • About the journal
  • J-STAGE home
  • Annals of Clinical Epidemiolog ...
  • Volume 4 (2022) Issue 2
  • Article overview

Department of Health Services Research, Faculty of Medicine, University of Tsukuba Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine

Tokyo University of Science, Department of Information and Computer Technology

Corresponding author

ORCID

2022 Volume 4 Issue 2 Pages 33-40

  • Published: 2022 Received: - Available on J-STAGE: April 04, 2022 Accepted: - Advance online publication: - Revised: -

(compatible with EndNote, Reference Manager, ProCite, RefWorks)

(compatible with BibDesk, LaTeX)

Matching is a technique through which patients with and without an outcome of interest (in case-control studies) or patients with and without an exposure of interest (in cohort studies) are sampled from an underlying cohort to have the same or similar distributions of some characteristics. This technique is used to increase the statistical efficiency and cost efficiency of studies. In case-control studies, besides time in risk set sampling, controls are often matched for each case with respect to important confounding factors, such as age and sex, and covariates with a large number of values or levels, such as area of residence (e.g., post code) and clinics/hospitals. In the statistical analysis of matched case-control studies, fixed-effect models such as the Mantel-Haenszel odds ratio estimator and conditional logistic regression model are needed to stratify matched case-control sets and remove selection bias artificially introduced by sampling controls. In cohort studies, exact matching is used to increase study efficiency and remove or reduce confounding effects of matching factors. Propensity score matching is another matching method whereby patients with and without exposure are matched based on estimated propensity scores to receive exposure. If appropriately used, matching can improve study efficiency without introducing bias and could also present results that are more intuitive for clinicians.

Matching is mainly used in observational studies, including case-control and cohort studies. Matching is a technique by which patients with and without an outcome of interest (in case-control studies) or patients with and without an exposure of interest (in cohort studies) are sampled from an underlying cohort to have the same or similar distributions of characteristics such as age and sex.

The main purpose of matching is to increase study efficiency for data collection and subsequent statistical analysis. Matching helps researchers reduce the volume of data for collection without much loss of information (i.e., improving cost efficiency) and obtain more precise estimates than simple random sampling of the same number of patients (i.e., improving statistical efficiency). In addition, in cohort studies, matching can remove or reduce confounding effects of matching factors.

This paper aims to introduce basic principles of matching in case-control and cohort studies, with some recent examples.

Fig. 1 Graphical representation of cumulative incidence sampling (A), case-control sampling (B), and risk set sampling (C) for 10 example patients in a cohort. ● indicates an outcome onset and time at selection as a case. ○ indicates time at selection as a control.

1.  Wacholder S. The case-control study as data missing by design: estimating risk differences. Epidemiology 1996;7:144–150.

2.  Noma H, Tanaka S. Analysis of case-cohort designs with binary outcomes: improving efficiency using whole-cohort auxiliary information. Stat Methods Med Res 2017;26:691–706.

case control and cohort study methods

Fig. 2 Graphical representation of a risk set sampling for 10 example patients in a population-based cohort. ● indicates an outcome onset and time at selection as a case. ○ indicates time at selection as a control.

case control and cohort study methods

In a study requiring primary data collection, case-control study designs are efficient because only information on cases and selected controls, instead of all people in the underlying cohort, is collected and used for statistical analysis. Especially for rare outcomes, a cohort study recruiting many people to observe a sufficient number of outcomes is not feasible. However, a case-control design would still be feasible, with reduced costs and efforts.

3.  Schuemie MJ, Ryan PB, Man KKC, Wong ICK, Suchard MA, Hripcsak G. A plea to stop using the case-control design in retrospective database studies. Stat Med 2019;38:4199–4208.

4.  Schneeweiss S, Suissa S.Discussion of Schuemie et al. “A plea to stop using the case-control design in retrospective database studies”. Stat Med 2019;38:4209–4212.

Similar to cohort studies, case-control studies typically require confounder adjustment using stratified analysis or regression modeling. To further improve statistical efficiency in adjusted analyses, case-control studies may match controls on confounders to be adjusted for, i.e., sampling a control(s) with an identical (or nearly identical) value of confounders for each case. When the total number of cases and controls to be sampled is fixed, the adjusted odds ratio estimates are likely to be less variable (i.e., more statistically efficient) in case-control data matched on strong confounders than in unmatched data.

Besides common confounding factors such as age and sex, area of residence (e.g., post code) or clinics/hospitals (which patients are registered to or visit) are sometimes matched between cases and controls. If variables with a large number of values or levels (e.g., over 1,000 post codes or clinics/hospitals) are adjusted for as “surrogate” confounders in the statistical analysis, at least one case and one control in each area (or clinic/hospital) are needed; otherwise, the data are discarded in the fixed-effect models (stratification). Although a case and control may rarely come from the same area (or clinic/hospital) in unmatched case-control sampling, matching can ensure that the pairs (or sets) of cases and controls are derived from the same area (or clinics/hospitals). Consequently, the odds ratio adjusted for these variables can be efficiently estimated.

5.  Rothman KJ, Lash TL. 6 Epidemiologic study design with validity and efficiency considerations. Modern epidemiology 4th edition. Lippincott Williams & Wilkins, 2021:105–140.

6.  Marsh JL, Hutton JL, Binks K. Removal of radiation dose response effects: an example of over-matching. BMJ 2002;325:327–330.

7.  Richardson K, Fox C, Maidment I, Steel N, Loke YK, Arthur A, et al. Anticholinergic drugs and risk of dementia: case-control study. BMJ 2018;361:k1315.

8.  Lapi F, Azoulay L, Yin H, Nessim SJ, Suissa S. Concurrent use of diuretics, angiotensin converting enzyme inhibitors, and angiotensin receptor blockers with non-steroidal anti-inflammatory drugs and risk of acute kidney injury: nested case-control study. BMJ 2013;346:e8525.

9.  Woodward M. Epidemiology: study design and data analysis. Chapman & Hall: Boca Raton, 1999:265.

10.  Hennessy S, Bilker WB, Berlin JA, Strom BL. Factors influencing the optimal control-to-case ratio in matched case-control studies. Am J Epidemiol 1999;149:195–197.

Sometimes, a case cannot find a prespecified number of controls. For example, in a case-control study planning 1:4 matching, some cases could find only less than four controls. However, it is not necessary to exclude these pairs when matching factors or matched sets of cases and controls are stratified in the analysis. The mixture of pairs with different matching ratios will not result in a biased estimate as long as an adequate adjustment for matching factors is adopted.

11.  Wang MH, Shugart YY, Cole SR, Platz EA. A simulation study of control sampling methods for nested case-control studies of genetic and molecular biomarkers and prostate cancer progression. Cancer Epidemiol Biomarkers Prev 2009;18:706–711.

To remove the selection bias artificially introduced by case-control matching, it is necessary to “stratify” data on matching factors in the statistical analysis. One traditional method is the Mantel-Haenszel odds ratio estimator that stratifies on matching factors themselves (e.g., subgroups by age group and sex, if controls are matched on these factors) or matched sets (e.g., each pair of a case and control). The Mantel-Haenszel estimator adjusts for matching factors as fixed effects and estimates a common odds ratio assumed to be constant across strata. The Mantel-Haenszel odds ratio estimator consistently estimates the common odds ratio when each stratum contains sparse data (e.g., only two patients, one case and one control, in each stratum) but the number of strata increases. Adjusting for confounding factors besides the matching factors by additional stratification within the matching factor strata is infeasible.

12.  Pearce N. Analysis of matched case-control studies. BMJ 2016;352:i969.

13.  Greenland S. Partial and marginal matching in case-control studies. Modern statistical methods in chronic disease epidemiology. Wiley: New York, NY, 1986:35–49.

Finally, time at matching (time from cohort entry, calendar time, or possibly age as time from birth) can be considered one of the “matching factors” in risk set sampling. If the hazard of disease incidence varies with time and the exposure prevalence changes during follow-up, time should be accounted for as a “confounder.” To do so, one can use the Mantel-Haenszel odds ratio estimator or a conditional logistic regression model, which estimates the hazard ratio constant over time (and across other matching factors, if any) that would be modeled by the Cox proportional hazards model in an underlying cohort.

14.  Hayashi M, Takamatsu I, Kanno Y, Yoshida T, Abe T, Sato Y, Japanese Calciphylaxis Study Group. A case-control study of calciphylaxis in Japanese end-stage renal disease patients. Nephrol Dial Transplant 2012;27:1580–1584.

15.  Iwagami M, Taniguchi Y, Jin X, Adomi M, Mori T, Hamada S, et al. Association between recorded medical diagnoses and incidence of long-term care needs certification: a case control study using linked medical and long-term care data in two Japanese cities. Annals Clin Epidemiol 2019;1:56–68.

A matched cohort study may also be conducted from a practical viewpoint: it would provide an intuitive presentation of patient characteristics in “comparable” exposure groups matched on important confounding factors such as age, sex, and calendar time. As crude absolute measures (such as risks and rates) during the follow-up period are easily summarized in exposed and unexposed patients, clinicians unfamiliar with statistical analysis can grasp the difference between the two groups in a non-statistical manner.

Fig. 3 Graphical representation of a matched-pair cohort study for 10 example patients in a cohort. Solid lines indicate that people are exposed, dotted lines denote that people are not exposed, and ● indicates the incidence of outcome.

case control and cohort study methods

Fig. 4 Graphical representation of a matched-pair cohort study for 10 example patients in a population-based cohort. Solid lines denote that people are exposed, dotted lines denote that people are not exposed, ▼ indicates the timing of the matched-pair cohort inclusion in the exposed group, ▽ indicates the timing of the matched-pair cohort inclusion in the non-exposed group, and ● indicates the incidence of outcome.

16.  Suissa S, Dell’Aniello S. Time-related biases in pharmacoepidemiology. Pharmacoepidemiol Drug Saf 2020;29:1101–1110.

17.  Thomas LE, Yang S, Wojdyla D, Schaubel DE. Matching with time-dependent treatments: a review and look forward. Stat Med 2020;39:2350–2370.

case control and cohort study methods

18.  Greenland S, Morgenstern H. Matching and efficiency in cohort studies. Am J Epidemiol 1990;131:151–159.

Regarding the matching ratio, 1:4 or 1:5 is sometimes chosen in matched-pair cohort studies, whereas 1:1 may be chosen more frequently to prioritize simplicity and intuitiveness. Mixed matching ratios (meaning that, for example, some pairs are matched in a ratio of 1:4, whereas other pairs are matched by a ratio of 1:3, 1:2, or 1:1 between exposed and unexposed people) will not cause bias if matching variables or matched sets are adjusted for in the analysis. In contrast, as such varying matching ratios do not balance the distributions of matching factors in exposed and unexposed people, the unadjusted comparison in the matched cohort still suffers from confounding bias.

Matching with or without replacement remains the choice of researchers, although matching without replacement may be more intuitive for clinicians.

19.  Shinozaki T, Mansournia MA, Matsuyama Y. On hazard ratio estimators by proportional hazards models in matched-pair cohort studies. Emerg Themes Epidemiol 2017;14:6.

20.  Sjölander A, Greenland S. Ignoring the matching variables in cohort studies – when is it valid and why? Stat Med 2013;32:4696–4708.

21.  Sutradhar R, Baxter NN, Austin PC. Terminating observation within matched pairs of subjects in a matched cohort analysis: a Monte Carlo simulation study. Stat Med 2016;35:294–304.

22.  Shinozaki T, Mansournia MA. Hazard ratio estimators after terminating observation within matched pairs in sibling and propensity score matched designs. Int J Biostat 2019;15.

23.  Yasunaga H. Introduction to applied statistics—chapter 1 propensity score analysis. Annals Clin Epidemiol 2020;2:33–37.

24.  Abadie A, Imbens GW. Matching on the estimated propensity score. Econometrica 2016;84:781–807.

25.  Shinozaki T, Nojima M. Misuse of regression adjustment for additional confounders following insufficient propensity score balancing. Epidemiology 2019;30:541–548.

26.  Ohbe H, Goto T, Miyamoto Y, Yasunaga H. Risk of cardiovascular events after spouse’s ICU admission. Circulation 2020;142:1691–1693.

27.  Nagasu H, Yano Y, Kanegae H, Heerspink HJL, Nangaku M, Hirakawa Y, et al. Kidney outcomes associated with SGLT2 inhibitors versus other glucose-lowering drugs in real-world clinical practice: the Japan chronic kidney disease database. Diabetes Care 2021;44:2542–2551.

We have provided an overview and some recent examples of matching in case-control and cohort studies. Matching in case-control studies can increase study efficiency, including both cost and statistical efficiencies. Nevertheless, caution is still warranted since inappropriate sampling of controls and application of statistical analysis without stratification would result in a biased estimate. In cohort studies, exact matching can increase efficiency and remove or reduce the confounding effect of matching factors, whereas a propensity score matching can be used to balance the distributions of measured confounding factors between exposed and unexposed individuals. If appropriately used, matching can improve study efficiency without introducing bias and can present results that are more intuitive for clinicians.

We would like to thank Dr. Hiroyuki Ohbe of the Department of Clinical Epidemiology and Health Economics, School of Public Health, The University of Tokyo, and Dr. Motohiko Adomi in the Department of Epidemiology, Harvard T.H. Chan School of Public Health, for their critical reading of the manuscript and feedback.

No potential competing interests relevant to this paper are reported.

  • 1.    Wacholder  S. The case-control study as data missing by design: estimating risk differences. Epidemiology 1996; 7 :144–150.
  • 2.    Noma  H,  Tanaka  S. Analysis of case-cohort designs with binary outcomes: improving efficiency using whole-cohort auxiliary information. Stat Methods Med Res 2017; 26 :691–706.
  • 3.    Schuemie  MJ,  Ryan  PB,  Man  KKC,  Wong  ICK,  Suchard  MA,  Hripcsak  G. A plea to stop using the case-control design in retrospective database studies. Stat Med 2019; 38 :4199–4208.
  • 4.    Schneeweiss  S, Suissa S.Discussion of Schuemie et al. “A plea to stop using the case-control design in retrospective database studies”. Stat Med 2019; 38 :4209–4212.
  • 5.   Rothman KJ, Lash TL. 6 Epidemiologic study design with validity and efficiency considerations. Modern epidemiology 4th edition. Lippincott Williams & Wilkins, 2021:105–140.
  • 6.    Marsh  JL,  Hutton  JL,  Binks  K. Removal of radiation dose response effects: an example of over-matching. BMJ 2002; 325 :327–330.
  • 7.    Richardson  K,  Fox  C,  Maidment  I,  Steel  N,  Loke  YK,  Arthur  A, et al. Anticholinergic drugs and risk of dementia: case-control study. BMJ 2018; 361 :k1315.
  • 8.    Lapi  F,  Azoulay  L,  Yin  H,  Nessim  SJ,  Suissa  S. Concurrent use of diuretics, angiotensin converting enzyme inhibitors, and angiotensin receptor blockers with non-steroidal anti-inflammatory drugs and risk of acute kidney injury: nested case-control study. BMJ 2013; 346 :e8525.
  • 9.   Woodward M. Epidemiology: study design and data analysis. Chapman & Hall: Boca Raton, 1999:265.
  • 10.    Hennessy  S,  Bilker  WB,  Berlin  JA,  Strom  BL. Factors influencing the optimal control-to-case ratio in matched case-control studies. Am J Epidemiol 1999; 149 :195–197.
  • 11.    Wang  MH,  Shugart  YY,  Cole  SR,  Platz  EA. A simulation study of control sampling methods for nested case-control studies of genetic and molecular biomarkers and prostate cancer progression. Cancer Epidemiol Biomarkers Prev 2009; 18 :706–711.
  • 12.    Pearce  N. Analysis of matched case-control studies. BMJ 2016; 352 :i969.
  • 13.   Greenland S. Partial and marginal matching in case-control studies. Modern statistical methods in chronic disease epidemiology. Wiley: New York, NY, 1986:35–49.
  • 14.    Hayashi  M,  Takamatsu  I,  Kanno  Y,  Yoshida  T,  Abe  T,  Sato  Y,  Japanese Calciphylaxis Study  Group. A case-control study of calciphylaxis in Japanese end-stage renal disease patients. Nephrol Dial Transplant 2012; 27 :1580–1584.
  • 15.    Iwagami  M,  Taniguchi  Y,  Jin  X,  Adomi  M,  Mori  T,  Hamada  S, et al. Association between recorded medical diagnoses and incidence of long-term care needs certification: a case control study using linked medical and long-term care data in two Japanese cities. Annals Clin Epidemiol 2019; 1 :56–68.
  • 16.    Suissa  S,  Dell’Aniello  S. Time-related biases in pharmacoepidemiology. Pharmacoepidemiol Drug Saf 2020; 29 :1101–1110.
  • 17.    Thomas  LE,  Yang  S,  Wojdyla  D,  Schaubel  DE. Matching with time-dependent treatments: a review and look forward. Stat Med 2020; 39 :2350–2370.
  • 18.    Greenland  S,  Morgenstern  H. Matching and efficiency in cohort studies. Am J Epidemiol 1990; 131 :151–159.
  • 19.    Shinozaki  T,  Mansournia  MA,  Matsuyama  Y. On hazard ratio estimators by proportional hazards models in matched-pair cohort studies. Emerg Themes Epidemiol 2017; 14 :6.
  • 20.    Sjölander  A,  Greenland  S. Ignoring the matching variables in cohort studies – when is it valid and why? Stat Med 2013; 32 :4696–4708.
  • 21.    Sutradhar  R,  Baxter  NN,  Austin  PC. Terminating observation within matched pairs of subjects in a matched cohort analysis: a Monte Carlo simulation study. Stat Med 2016; 35 :294–304.
  • 22.    Shinozaki  T,  Mansournia  MA. Hazard ratio estimators after terminating observation within matched pairs in sibling and propensity score matched designs. Int J Biostat 2019; 15 .
  • 23.    Yasunaga  H. Introduction to applied statistics—chapter 1 propensity score analysis. Annals Clin Epidemiol 2020; 2 :33–37.
  • 24.    Abadie  A,  Imbens  GW. Matching on the estimated propensity score. Econometrica 2016; 84 :781–807.
  • 25.    Shinozaki  T,  Nojima  M. Misuse of regression adjustment for additional confounders following insufficient propensity score balancing. Epidemiology 2019; 30 :541–548.
  • 26.    Ohbe  H,  Goto  T,  Miyamoto  Y,  Yasunaga  H. Risk of cardiovascular events after spouse’s ICU admission. Circulation 2020; 142 :1691–1693.
  • 27.    Nagasu  H,  Yano  Y,  Kanegae  H,  Heerspink  HJL,  Nangaku  M,  Hirakawa  Y, et al. Kidney outcomes associated with SGLT2 inhibitors versus other glucose-lowering drugs in real-world clinical practice: the Japan chronic kidney disease database. Diabetes Care 2021; 44 :2542–2551.

case control and cohort study methods

Register with J-STAGE for free!

Already have an account? Sign in here

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Can J Hosp Pharm
  • v.67(5); Sep-Oct 2014

Logo of cjhp

An Introduction to the Fundamentals of Cohort and Case–Control Studies

Associated data, introduction.

As pharmacotherapy experts, pharmacists are continually updating their knowledge about drug effects. In addition to being knowledge users of research findings, pharmacists increasingly play a larger role in observational studies of drug effects. Observational studies are inherently nonexperimental and, unlike randomized clinical trials (RCTs), do not involve any manipulation (such as randomization) of the treatment and control groups by the investigator.

This article reviews for the practising pharmacist the fundamental design elements and foundational methodologic knowledge for conducting cohort and case–control studies, 2 common and robust observational study designs for elucidating drug–outcome associations. Readers interested in learning about other observational study designs, such as cross-sectional studies, ecological studies, case series, case reports, within-person studies, and quasi-experimental designs, or the critical appraisal of such designs, are referred elsewhere. 1 – 6

WHY WE NEED COHORT AND CASE–CONTROL STUDIES

We need well-designed and rigorous cohort and case– control studies because their findings provide knowledge complementary to that garnered from RCTs ( Table 1 ). The design properties of RCTs maximize their ability to estimate the potential causal effects of drugs under ideal circumstances and thereby to estimate the efficacy of those drugs. However, many RCTs involve a relatively limited number of highly selected patients and a limited duration. Indeed, RCTs typically follow patients for only a small fraction of the time that the drug would be used in clinical practice, especially when the medications are for chronic diseases. Moreover, RCTs typically exclude complex patients, they often use irrelevant comparators (e.g., placebo), and they frequently measure outcomes that are not patient-centred (i.e., surrogate end points). 7 Although many of these limitations may be overcome by designing more pragmatic RCTs that do indeed measure effectiveness, 8 cohort and case–control studies are 2 feasible study design alternatives that address the limitations of RCTs ( Table 1 ) without the considerable financial and human resource costs of pragmatic RCTs.

Limitations of Randomized Clinical Trials (RCTs) Potentially Addressed by Cohort and Case–Control Studies

COHORT STUDIES

A cohort is a group of people who share a common experience or characteristic. The term “cohort” first appeared in the medical literature in the 1930s in an article by epidemiologist W H Frost. 9 Interestingly, the word “cohort” has military roots, originating from the Latin word “cohors”. 10 The term was first used in the Roman military, where a group of 300 to 600 soldiers constituted a cohort. 11

A cohort study compares the experience of 2 or more groups of patients who are followed concurrently forward in time ( Figure 1 ). This prospective tracking, from exposure to outcome, is in fact one of the defining features of a cohort study. 11 The temporal sequence involved in following a group of patients who are exposed to a certain factor (the treatment group) and a group of patients who are not exposed to that factor (the control group) is akin to that of a clinical trial, where instead of chance determining a patient’s exposure status (as occurs in an RCT), choice or happenstance determines exposure status.

An external file that holds a picture, illustration, etc.
Object name is cjhp-67-366f1.jpg

Schematic for the design of cohort and case–control studies.

Selecting the Study Cohort

For any cohort study, a source population must be defined, from which the eligible study cohort is derived through application of various inclusion and exclusion criteria. At a minimum, patients entering the study cohort must be free of the outcome of interest. For example, in a cohort study designed to measure the association between atypical antipsychotics and diabetes mellitus, patients with diabetes would have to be excluded from the study cohort because they are not at risk of the outcome. Often, other restrictions are put in place to minimize the risk of bias. For example, restriction to new users of a medication will ensure avoidance of multiple biases. 12 Inclusion of prevalent or current drug users can lead to significant bias because patients who experience early intolerance or adverse effects of a drug may discontinue the drug, and the remaining cohort will consist of a healthier and usually more adherent group. 13 Risk that varies over time, whereby new users have a higher risk of an adverse event, has been observed for numerous associations, including those between nonsteroidal anti-inflammatory drugs and upper gastrointestinal bleeding, 14 oral contraceptives and venous thromboembolism, 15 benzodiazepines and falls, 16 and angiotensin-converting enzyme inhibitors and angioedema. 17

Defining Drug Exposure Groups

Once the study cohort has been created, 2 or more exposure groups must be clearly defined, 1 of which must serve as the control or reference group. The reference group should be clinically relevant. For example, in a comparative safety or effectiveness study, patients taking a drug within the same therapeutic class or receiving usual care may serve as the reference group. If clinically and scientifically relevant, a group with no therapeutic exposure may be the reference group. Drug exposure may be measured in terms of persons or person-time (the time for which a person is exposed to a particular drug). Drug exposure is often categorized in a binary fashion (i.e., yes or no), based on either a minimum number of prescription records (e.g., at least 3 records) or a specified duration of exposure (e.g., at least 90 days’ exposure), or a combination of these 2 factors (i.e., cumulative exposure). Irrespective of how exposure is defined, it is essential that follow-up time be properly categorized following entry into the cohort to avoid time-related bias. 18 Furthermore, the definition of exposure should be coherent with the study hypothesis. For example, a certain amount of time or a certain dose of drug may be required to elicit an effect, or a drug may continue to have an effect once discontinued (e.g., bisphosphonates). Moreover, decisions about when to discontinue drug exposure must be made. There are 2 common approaches: “as treated”, whereby drug exposure is recorded as being stopped when a person no longer meets the definition of exposure; and “intention-to-treat”, whereby a person is considered exposed from the time of first meeting the study’s exposure definition until experiencing the outcome of interest or the end of the study, irrespective of changes in actual exposure status. There is no consensus on how to best define drug exposures, and hence the definitions of exposure often vary considerably among cohort studies assessing identical drug–outcome associations.

Measuring Occurrence of Outcomes

Complete and accurate measurement of the outcome of interest is essential to ensure the validity of study results. When subjective outcome data (e.g., diagnosis of pneumonia) are being collected during the study period, exposure status should be blinded for the outcome assessors and adjudicators, to prevent responder bias. When previously collected data (i.e., secondary data) are being used, investigators should ideally use outcome definitions that have been validated in previous studies. For example, Hux and others 19 validated definitions of diabetes by comparing International Classification of Diseases codes obtained from administrative health care databases in Ontario with diagnostic data from primary care charts.

Quantifying the Drug–Outcome Association

For cohort studies, the drug–outcome association is usually expressed as a relative risk, a relative rate, or a hazard ratio. Advanced statistical techniques are used to account for factors other than the drug exposure of interest that might distort the drug–outcome association. These factors or potential confounders are often handled simultaneously with multivariable regression models. Because these statistical models account for measured variables, it is crucial that the data source capture as many potential confounding variables (or proxies of confounders) as possible. Potential confounders should usually be measured before entry into the cohort, to avoid adjustment for factors in the causal pathway.

Strengths and Weaknesses

One of the major strengths of a cohort study is that the temporal sequence—drug exposure preceding outcome—is explicit in the study design. The incidence of a particular outcome among persons exposed to a certain drug can be directly calculated using a cohort design. Cohort studies are also relatively efficient for studying rare exposures, and multiple outcomes may be assessed for a single exposure. However, cohort studies with long observation periods may be more susceptible to losses to follow-up and to inaccurate measurement of exposures and outcomes. Large numbers of patients may be required to precisely estimate meaningful drug–outcome associations, especially for rare outcomes or outcomes that take a long time to occur.

CASE-CONTROL STUDIES

The first case–control study using the design with which we are familiar today was published in 1926. However, the concept of case–control studies has its origins in the investigation of disease etiologies through detailed histories and examination of patients. 20

In a case–control study, a number of cases and noncases (controls) are identified, and the occurrence of one or more prior exposures is compared between groups to evaluate drug–outcome associations ( Figure 1 ). A case–control study runs in reverse relative to a cohort study. 21 As such, study inception occurs when a patient experiences an outcome and is thus designated a “case”. A modern conceptual view holds that the case–control study can be thought of as an efficient cohort design. Essentially, patients who would have experienced the outcome of interest in a cohort study are the cases in a case–control study. Similarly, patients who were at risk but did not experience the outcome of interest in a cohort study are the controls in a case–control study. The potential data sources for a case–control study are identical with those for a cohort study, and the investigator may collect data after study inception or may use previously collected data. An extension of the case–control study is the nested case–control study, which is a case–control study conducted within a cohort. Details regarding this design are beyond the scope of this article and are reviewed elsewhere. 22 , 23

Selection of Cases

The first step in a case–control study is to identify the cases through application of explicitly defined inclusion and exclusion criteria. Ideally, cases should be directly sampled from the source population in a manner that is unrelated to the drug exposures of interest; however, the source population that gave rise to the cases is often unknown and difficult to identify (except in a nested-case control study, where the source population is known). The case-selection process and the data sources from which cases were selected should be described in detail, especially if cases are from a variety of sources, such as hospital and community-based sources. Selecting only hospital-based cases may lead to systematic error related to hospital admission practices, whereby exposed cases may be more likely to be admitted and therefore selected into the study (a phenomenon known as Berksonian bias). Furthermore, only new (incident) cases should be selected, as nonincident cases usually over-represent long-term survivors, and diagnostic practices may change over time, introducing potential bias. When cases are selected from a secondary data source, the case definitions should be supported by previous validation studies.

Selection of Controls

The selection of controls in a case–control study is fraught with difficulty and is often the source of significant bias. Essentially, the controls should be selected from the same source population as the cases. 24 In other words, controls should be at risk of becoming cases and should come from a population with the same exposure distribution as the cases. Multiple controls are usually selected for each case, to increase the statistical efficiency of the study; however, the gains are minimal beyond 3 or 4 controls per case. Nonetheless, modern case–control studies involving large databases often use much higher control–case ratios to maximize study precision. To control for potential confounding, cases and controls are often matched on one or more patient characteristics, such as age or sex (although it may not always be appropriate to match on these variables). The study investigator must be careful not to match on too many factors or on factors that are not confounders, as doing so might lead to overmatching and bias. Furthermore, matching should not involve variables that the investigator is interested in examining in association with an outcome. The selection of controls is one of the most difficult aspects of epidemiologic research, and readers are encouraged to consult additional resources. 24 – 28

Similar to the situation for a cohort study, the drug exposures of interest and their definitions should be clearly specified in the methods. Because exposure in a case–control study is determined after the cases have been identified, a period before occurrence of the case, called the “look-back period” or “look-back window”, must be defined. A comparable look-back period must be defined for the control group. Look-back periods should consider the study hypothesis and thus may vary considerably from one study to another. For example, Abdelmoneim and others 29 specified a 120-day look-back period before the date of their cases (patients with acute coronary syndrome) to assess recent exposure to glyburide and gliclazide. Azoulay and others 30 specified an exposure window of any time prior to a year before the date of cases in their study evaluating the association between pioglitazone and bladder cancer. If the investigators are collecting exposure data themselves, then outcome status should be blinded to study personnel.

In a case–control study, the odds ratio is the usual measure of association reported. This measure is the ratio of the odds of an exposure between cases and controls and in most cases approximates the relative risk. As in a cohort study, the analytic plan for a case–control study typically involves advanced statistical methods to adjust for multiple potential confounders.

The major strengths of the case–control design are statistical efficiency (i.e., uses fewer data to quantify a drug–outcome association than would be required in a cohort study), efficiency for studying rare outcomes, efficiency for studying conditions with long latency periods, efficiency for handling the time-varying nature of drug exposures, and relatively low cost. The weaknesses of case–control studies include inefficiency for studying rare exposures, difficulty of selecting unbiased controls, and inability to directly calculate incidence rates of outcomes.

LIMITATIONS OF COHORT AND CASE–CONTROL STUDIES

Bias and confounding.

Observational studies are methodologically difficult, susceptible to bias and confounding, and difficult to interpret, given the many types of bias potentially at play. For these reasons, observational studies are limited to studying drug–outcome associations and cannot be used to measure the causal effects of drugs. Recent methodologic advances in design and analytic techniques in pharmacoepidemiology have helped to combat the various types of selection bias, information bias, and confounding at play in cohort and case–control studies (see Appendix 1 , available online at www.cjhp-online.ca/index.php/cjhp/issue/view/104/showToc ). Many of these techniques can account for multiple potential confounders simultaneously. A comprehensive review of these techniques is beyond the scope of this article, but such reviews may be found elsewhere, 25 , 31 – 33 Bias and confounding result in spurious drug–outcome associations and are introduced at both the design and analysis stages. Appendix 2 (available online at www.cjhp-online.ca/index.php/cjhp/issue/view/104/showToc ) illustrates the origin of bias in relation to the cohort design, and Appendix 3 (available online at www.cjhp-online.ca/index.php/cjhp/issue/view/104/showToc ) lists common types of bias that occur in cohort and case–control studies of drug effects.

Study of Intended Drug Effects

Cohort and case–control studies are powerful approaches for estimating the association between drugs and unintended outcomes 34 ; however, their use for studying the intended effects of drugs has spurred debate in the past and remains controversial today. 35 – 37 This controversy has arisen because the propensity for bias and confounding is much higher when estimating the intended effects of drugs (i.e., benefits). 37 This higher propensity for bias is in turn due to the nonrandom nature of prescribing practices and is often referred to as “confounding by the reason for the prescription” or simply “confounding by indication”. Confounding by indication is expected with these types of studies, as it is good medical practice to prescribe intentionally and rationally, as opposed to prescribing according to a random process. 38 Some authors strongly recommend against using observational studies to study intended effects, suggesting instead that we consider restricting our research questions to those of unintended effects because confounding by indication introduces uncontrollable bias. 31 , 34 , 39 , 40 The literature contains numerous examples of confounding by indication. A most striking example is the distorted 27-fold increased risk of thrombotic events associated with use of warfarin, when in fact warfarin prevents thrombotic events. 39 Another example of confounding by indication is the observed relationship between short-acting ß-receptor agonists (e.g., salbutamol) and increased risk of death from asthma. 41 Of course confounding by indication is not verifiable, but it must be considered when studying the intended effects of drugs.

GENERAL CONSIDERATIONS IN CONDUCTING A COHORT OR CASE–CONTROL STUDY

Protocol and study team.

Cohort and case–control studies aim to quantitatively estimate the association between a drug exposure and outcome. Before embarking on a cohort or case–control study, the investigators must develop a well-articulated and focused research question. 42 Furthermore, the study protocol, including a detailed methodologic and analytic plan, should be consistent with international guidelines. 43 , 44 The study team should have appropriate clinical and methodologic expertise. Clinical expertise is essential for developing exposure and outcome definitions, as well as for understanding the overall clinical context of how the research question fits into the current body of knowledge. Methodologic expertise is critical for ensuring that robust methods are used, to minimize bias and confounding.

Data Sources

To estimate a drug–outcome association in a cohort or case– control study, accurate and comprehensive data must be collected on the drug exposures and outcomes of interest. Study investigators may collect data after study inception or may use previously collected data. The major advantage of prospectively collecting the data (primary data collection) is that the investigators have control over what information is collected; in contrast, when previously collected data are used (secondary data collection), the investigators are limited to the information already collected. Data may often be missing from or inaccurately recorded in secondary data sources, which creates challenges when the data are used for research purposes. Although previously collected data are considered retrospective to study inception, the data themselves are often collected prospectively; therefore, use of the terms “retrospective” and “prospective” may be misleading and usually does not provide any clarity in terms of important design characteristics. 25 There are 3 main sources of existing data: administrative data, medical records, and surveys. Special considerations and the advantages and disadvantages of these secondary data sources are discussed elsewhere. 45 , 46 For studying drug effects, secondary data sources are more commonly used than primary data collection, primarily because of gains in time, cost, and statistical efficiency. Furthermore, use of secondary data sources avoids the Hawthorne effect, whereby knowledge of participation in a study changes the behaviour of study participants and may lead to bias.

CONCLUSIONS

Pharmacists use knowledge from cohort and case–control studies to inform patients, clinicians, and the general public about drug effects. At a basic level, cohort and case–control studies quantitatively estimate the relation between exposures and outcomes. They represent rigorous study designs for answering drug safety and effectiveness questions, with case–control studies being more prone to bias. The methodologic rigour of cohort and case–control studies evaluating drug–outcome associations is advancing, and approaches are being developed and refined that limit the generation of misleading study results. Indeed, both RCTs and observational studies are necessary, and neither is sufficient to learn about the totality of drug effects in the population.

Acknowledgments

John-Michael Gamble is supported by a New Investigator Award in drug safety and effectiveness from the Canadian Institutes of Health Research and a Clinician Scientist Award from the Canadian Diabetes Association.

This article is the sixth in the CJHP Research Primer Series, an initiative of the CJHP Editorial Board and the CSHP Research Committee. The planned 2-year series is intended to appeal to relatively inexperienced researchers, with the goal of building research capacity among practising pharmacists. The articles, presenting simple but rigorous guidance to encourage and support novice researchers, are being solicited from authors with appropriate expertise.

Previous article in this series:

Bond CM. The research jigsaw: how to get started. Can J Hosp Pharm . 2014;67(1):28–30.

Tully MP. Research: articulating questions, generating hypotheses, and choosing study designs. Can J Hosp Pharm . 2014;67(1):31–4.

Loewen P. Ethical issues in pharmacy practice research: an introductory guide. Can J Hosp Pharm. 2014;67(2):133–7.

Tsuyuki RT. Designing pharmacy practice research trials. Can J Hosp Pharm . 2014;67(3):226–9.

Bresee LC. An introduction to developing surveys for pharmacy practice research. Can J Hosp Pharm . 2014;67(4):286–91.

Competing interests: None declared.

Case-Control Studies

case control and cohort study methods

Introduction

Cohort studies have an intuitive logic to them, but they can be very problematic when one is investigating outcomes that only occur in a small fraction of exposed and unexposed individuals. They can also be problematic when it is expensive or very difficult to obtain exposure information from a cohort. In these situations a case-control design offers an alternative that is much more efficient. The goal of a case-control study is the same as that of cohort studies, i.e., to estimate the magnitude of association between an exposure and an outcome. However, case-control studies employ a different sampling strategy that gives them greater efficiency.

Learning Objectives

After completing this module, the student will be able to:

  • Define and explain the distinguishing features of a case-control study
  • Describe  and identify the types of epidemiologic questions that can be addressed by case-control studies
  • Define what is meant by the term "source population"
  • Describe the purpose of controls in a case-control study
  • Describe differences between hospital-based and population-based case-control studies
  • Describe the principles of valid control selection
  • Explain the importance of using specific diagnostic criteria and explicit case definitions in case-control studies
  • Estimate and interpret the odds ratio from a case-control study
  • Identify the potential strengths and limitations of case-control studies

Overview of Case-Control Design

In the module entitled Overview of Analytic Studies it was noted that Rothman describes the case-control strategy as follows:

"Case-control studies are best understood by considering as the starting point a source population , which represents a hypothetical study population in which a cohort study might have been conducted. The source population is the population that gives rise to the cases included in the study. If a cohort study were undertaken, we would define the exposed and unexposed cohorts (or several cohorts) and from these populations obtain denominators for the incidence rates or risks that would be calculated for each cohort. We would then identify the number of cases occurring in each cohort and calculate the risk or incidence rate for each. In a case-control study the same cases are identified and classified as to whether they belong to the exposed or unexposed cohort. Instead of obtaining the denominators for the rates or risks, however, a control group is sampled from the entire source population that gives rise to the cases. Individuals in the control group are then classified into exposed and unexposed categories. The purpose of the control group is to determine the relative size of the exposed and unexposed components of the source population. Because the control group is used to estimate the distribution of exposure in the source population, the cardinal requirement of control selection is that the controls be sampled independently of exposure status."

To illustrate this consider the following hypothetical scenario in which the source population is the state of Massachusetts. Diseased individuals are red, and non-diseased individuals are blue. Exposed individuals are indicated by a whitish midsection. Note the following aspects of the depicted scenario:

  • The disease is rare.
  • There is a fairly large number of exposed individuals in the state, but most of these are not diseased.

Map of Massachusetts with thousands of icon people overlaid. A very small percentage of them are identified as having a rare disease.

If we somehow had exposure and outcome information on all of the subjects in the source population and looked at the association using a cohort design, we might find the data summarized in the contingency table below.

In this hypothetical example, we have data on all 6,000,000 people in the source population, and we could compute the probability of disease (i.e., the risk or incidence) in both the exposed group and the non-exposed group, because we have the denominators for both the exposed and non-exposed groups.

The table above summarizes all of the necessary information regarding exposure and outcome status for the population and enables us to compute a risk ratio as a measure of the strength of the association. Intuitively, we compute the probability of disease (the risk) in each exposure group and then compute the risk ratio as follows:

The problem , of course, is that we usually don't have the resources to get the data on all subjects in the population. If we took a random sample of even 5-10% of the population, we would have few diseased people in our sample, certainly not enough to produce a reasonably precise measure of association. Moreover, we would expend an inordinate amount of effort and money collecting exposure and outcome data on a large number of people who would not develop the outcome.

We need a method that allows us to retain all the people in the numerator of disease frequency (diseased people or "cases") but allows us to collect information from only a small proportion of the people that make up the denominator (population, or "controls"), most of whom do not have the disease of interest. The case-control design allows us to accomplish this. We identify and collect exposure information on all the cases, but identify and collect exposure information on only a sample of the population. Once we have the exposure information, we can assign subjects to the numerator and denominator of the exposed and unexposed groups. This is what Rothman means when he says,

"The purpose of the control group is to determine the relative size of the exposed and unexposed components of the source population."

In the above example, we would have identified all 1,300 cases, determined their exposure status, and ended up categorizing 700 as exposed and 600 as unexposed. We might have ransomly sampled 6,000 members of the population (instead of 6 million) in order to determine the exposure distribution in the total population. If our sampling method was random, we would expect that about 1,000 would be exposed and 5,000 unexposed (the same ratio as in the overall population). We calculate a similar measure as the risk ratio above, but substituting in the denominator a sample of the population ("controls") instead of the whole population:

Note that when we take a sample of the population, we no longer have a measure of disease frequency, because the denominator no longer represents the population. Therefore, we can no longer compute the probability or rate of disease incidence in each exposure group. We also can't calculate a risk or rate difference measure for the same reason. However, as we have seen, we can compute the relative probability of disease in the exposed vs. unexposed group. The term generally used for this measure is an odds ratio , described in more detail later in the module.

Consequently, when the outcome is uncommon, as in this case, the risk ratio can be estimated much more efficiently by using a case-control design. One would focus first on finding an adequate number of cases in order to determine the ratio of exposed to unexposed cases. Then, one only needs to take a sample of the population in order to estimate the relative size of the exposed and unexposed components of the source population. Note that if one can identify all of the cases that were reported to a registry or other database within a defined period of time, then it is possible to compute an estimate of the incidence of disease if the size of the population is known from census data.   While this is conceptually possible, it is rarely done, and we will not discuss it further in this course.

Toggle open/close quiz question

A Nested Case-Control Study

Suppose a prospective cohort study were conducted among almost 90,000 women for the purpose of studying the determinants of cancer and cardiovascular disease. After enrollment, the women provide baseline information on a host of exposures, and they also provide baseline blood and urine samples that are frozen for possible future use. The women are then followed, and, after about eight years, the investigators want to test the hypothesis that past exposure to pesticides such as DDT is a risk factor for breast cancer. Eight years have passed since the beginning of the study, and 1.439 women in the cohort have developed breast cancer. Since they froze blood samples at baseline, they have the option of analyzing all of the blood samples in order to ascertain exposure to DDT at the beginning of the study before any cancers occurred. The problem is that there are almost 90,000 women and it would cost $20 to analyze each of the blood samples. If the investigators could have analyzed all 90,000 samples this is what they would have found the results in the table below.

Table of Breast Cancer Occurrence Among Women With or Without DDT Exposure

If they had been able to afford analyzing all of the baseline blood specimens in order to categorize the women as having had DDT exposure or not, they would have found a risk ratio = 1.87 (95% confidence interval: 1.66-2.10). The problem is that this would have cost almost $1.8 million, and the investigators did not have the funding to do this.

While 1,439 breast cancers is a disturbing number, it is only 1.6% of the entire cohort, so the outcome is relatively rare, and it is costing a lot of money to analyze the blood specimens obtained from all of the non-diseased women. There is, however, another more efficient alternative, i.e., to use a case-control sampling strategy. One could analyze all of the blood samples from women who had developed breast cancer, but only a sample of the whole cohort in order to estimate the exposure distribution in the population that produced the cases.

If one were to analyze the blood samples of 2,878 of the non-diseased women (twice as many as the number of cases), one would obtain results that would look something like those in the next table.

Odds of Exposure: 360/1079 in the cases versus 432/2,446 in the non-diseased controls.

Totals Samples analyzed = 1,438+2,878 = 4,316

Total Cost = 4,316 x $20 = $86,320

With this approach a similar estimate of risk was obtained after analyzing blood samples from only a small sample of the entire population at a fraction of the cost with hardly any loss in precision. In essence, a case-control strategy was used, but it was conducted within the context of a prospective cohort study. This is referred to as a case-control study "nested" within a cohort study.

Rothman states that one should look upon all case-control studies as being "nested" within a cohort. In other words the cohort represents the source population that gave rise to the cases. With a case-control sampling strategy one simply takes a sample of the population in order to obtain an estimate of the exposure distribution within the population that gave rise to the cases. Obviously, this is a much more efficient design.

It is important to note that, unlike cohort studies, case-control studies do not follow subjects through time. Cases are enrolled at the time they develop disease and controls are enrolled at the same time. The exposure status of each is determined, but they are not followed into the future for further development of disease.

As with cohort studies, case-control studies can be prospective or retrospective. At the start of the study, all cases might have already occurred and then this would be a retrospective case-control study. Alternatively, none of the cases might have already occurred, and new cases will be enrolled prospectively. Epidemiologists generally prefer the prospective approach because it has fewer biases, but it is more expensive and sometimes not possible. When conducted prospectively, or when nested in a prospective cohort study, it is straightforward to select controls from the population at risk. However, in retrospective case-control studies, it can be difficult to select from the population at risk, and controls are then selected from those in the population who didn't develop disease. Using only the non-diseased to select controls as opposed to the whole population means the denominator is not really a measure of disease frequency, but when the disease is rare , the odds ratio using the non-diseased will be very similar to the estimate obtained when the entire population is used to sample for controls. This phenomenon is known as the r are-disease assumption . When case-control studies were first developed, most were conducted retrospectively, and it is sometimes assumed that the rare-disease assumption applies to all case-control studies. However, it actually only applies to those case-control studies in which controls are sampled only from the non-diseased rather than the whole population.  

The difference between sampling from the whole population and only the non-diseased is that the whole population contains people both with and without the disease of interest. This means that a sampling strategy that uses the whole population as its source must allow for the fact that people who develop the disease of interest can be selected as controls. Students often have a difficult time with this concept. It is helpful to remember that it seems natural that the population denominator includes people who develop the disease in a cohort study. If a case-control study is a more efficient way to obtain the information from a cohort study, then perhaps it is not so strange that the denominator in a case-control study also can include people who develop the disease. This topic is covered in more detail in EP813 Intermediate Epidemiology.

Retrospective and Prospective Case-Control Studies

Students usually think of case-control studies as being only retrospective, since the investigators enroll subjects who have developed the outcome of interest. However, case-control studies, like cohort studies, can be either retrospective or prospective. In a prospective case-control study, the investigator still enrolls based on outcome status, but the investigator must wait to the cases to occur.

When is a Case-Control Study Desirable?

Given the greater efficiency of case-control studies, they are particularly advantageous in the following situations:

  • When the disease or outcome being studied is rare.
  • When the disease or outcome has a long induction and latent period (i.e., a long time between exposure and the eventual causal manifestation of disease).
  • When exposure data is difficult or expensive to obtain.
  • When the study population is dynamic.
  • When little is known about the risk factors for the disease, case-control studies provide a way of testing associations with multiple potential risk factors. (This isn't really a unique advantage to case-control studies, however, since cohort studies can also assess multiple exposures.)

Another advantage of their greater efficiency, of course, is that they are less time-consuming and much less costly than prospective cohort studies.

The DES Case-Control Study

A classic example of the efficiency of the case-control approach is the study (Herbst et al.: N. Engl. J. Med. Herbst et al. (1971;284:878-81) that linked in-utero exposure to diethylstilbesterol (DES) with subsequent development of vaginal cancer 15-22 years later. In the late 1960s, physicians at MGH identified a very unusual cancer cluster. Eight young woman between the ages of 15-22 were found to have cancer of the vagina, an uncommon cancer even in elderly women. The cluster of cases in young women was initially reported as a case series, but there were no strong hypotheses about the cause.

In retrospect, the cause was in-utero exposure to DES. After World War II, DES started being prescribed for women who were having troubles with a pregnancy -- if there were signs suggesting the possibility of a miscarriage, DES was frequently prescribed. It has been estimated that between 1945-1950 DES was prescribed for about 20% of all pregnancies in the Boston area. Thus, the unborn fetus was exposed to DES in utero, and in a very small percentage of cases this resulted in development of vaginal cancer when the child was 15-22 years old (a very long latent period). There were several reasons why a case-control study was the only feasible way to identify this association: the disease was extremely rare (even in subjects who had been exposed to DES), there was a very long latent period between exposure and development of disease, and initially they had no idea what was responsible, so there were many possible exposures to consider.

In this situation, a case-control study was the only reasonable approach to identify the causative agent. Given how uncommon the outcome was, even a large prospective study would have been unlikely to have more than one or two cases, even after 15-20 years of follow-up. Similarly, a retrospective cohort study might have been successful in enrolling a large number of subjects, but the outcome of interest was so uncommon that few, if any, subjects would have had it. In contrast, a case-control study was conducted in which eight known cases and 32 age-matched controls provided information on many potential exposures. This strategy ultimately allowed the investigators to identify a highly significant association between the mother's treatment with DES during pregnancy and the eventual development of adenocarcinoma of the vagina in their daughters (in-utero at the time of exposure) 15 to 22 years later.

For more information see the DES Fact Sheet from the National Cancer Institute.

An excellent summary of this landmark study and the long-range effects of DES can be found in a Perspective article in the New England Journal of Medicine. A cohort of both mothers who took DES and their children (daughters and sons) was later formed to look for more common outcomes. Members of the faculty at BUSPH are on the team of investigators that follow this cohort for a variety of outcomes, particularly reproductive consequences and other cancers.

Selecting & Defining Cases and Controls

The "case" definition.

Careful thought should be given to the case definition to be used. If the definition is too broad or vague, it is easier to capture people with the outcome of interest, but a loose case definition will also capture people who do not have the disease. On the other hand, an overly restrictive case definition is employed, fewer cases will be captured, and the sample size may be limited. Investigators frequently wrestle with this problem during outbreak investigations. Initially, they will often use a somewhat broad definition in order to identify potential cases. However, as an outbreak investigation progresses, there is a tendency to narrow the case definition to make it more precise and specific, for example by requiring confirmation of the diagnosis by laboratory testing. In general, investigators conducting case-control studies should thoughtfully construct a definition that is as clear and specific as possible without being overly restrictive.

Investigators studying chronic diseases generally prefer newly diagnosed cases, because they tend to be more motivated to participate, may remember relevant exposures more accurately, and because it avoids complicating factors related to selection of longer duration (i.e., prevalent) cases. However, it is sometimes impossible to have an adequate sample size if only recent cases are enrolled.

Sources of Cases

Typical sources for cases include:

  • Patient rosters at medical facilities
  • Death certificates
  • Disease registries (e.g., cancer or birth defect registries; the SEER Program [Surveillance, Epidemiology and End Results] is a federally funded program that identifies newly diagnosed cases of cancer in population-based registries across the US )
  • Cross-sectional surveys (e.g., NHANES, the National Health and Nutrition Examination Survey)

Selection of the Controls

As noted above, it is always useful to think of a case-control study as being nested within some sort of a cohort, i.e., a source population that produced the cases that were identified and enrolled. In view of this there are two key principles that should be followed in selecting controls:

  • The comparison group ("controls") should be representative of the source population that produced the cases.
  • The "controls" must be sampled in a way that is independent of the exposure, meaning that their selection should not be more (or less) likely if they have the exposure of interest.

If either of these principles are not adhered to, selection bias can result (as discussed in detail in the module on Bias).

case control and cohort study methods

Note that in the earlier example of a case-control study conducted in the Massachusetts population, we specified that our sampling method was random so that exposed and unexposed members of the population had an equal chance of being selected. Therefore, we would expect that about 1,000 would be exposed and 5,000 unexposed (the same ratio as in the whole population), and came up with an odds ratio that was same as the hypothetical risk ratio we would have had if we had collected exposure information from the whole population of six million:

What if we had instead been more likely to sample those who were exposed, so that we instead found 1,500 exposed and 4,500 unexposed among the 6,000 controls?   Then the odds ratio would have been:

This odds ratio is biased because it differs from the true odds ratio.   In this case, the bias stemmed from the fact that we violated the second principle in selection of controls. Depending on which category is over or under-sampled, this type of bias can result in either an underestimate or an overestimate of the true association.

A hypothetical case-control study was conducted to determine whether lower socioeconomic status (the exposure) is associated with a higher risk of cervical cancer (the outcome). The "cases" consisted of 250 women with cervical cancer who were referred to Massachusetts General Hospital for treatment for cervical cancer. They were referred from all over the state. The cases were asked a series of questions relating to socioeconomic status (household income, employment, education, etc.). The investigators identified control subjects by going door-to-door in the community around MGH from 9:00 AM to 5:00  PM. Many residents are not home, but they persist and eventually enroll enough controls. The problem is that the controls were selected by a different mechanism than the cases, AND the selection mechanism may have tended to select individuals of different socioeconomic status, since women who were at home may have been somewhat more likely to be unemployed. In other words, the controls were more likely to be enrolled (selected) if they had the exposure of interest (lower socioeconomic status). 

Toggle open/close quiz question

Sources for "Controls"

Population controls:.

A population-based case-control study is one in which the cases come from a precisely defined population, such as a fixed geographic area, and the controls are sampled directly from the same population. In this situation cases might be identified from a state cancer registry, for example, and the comparison group would logically be selected at random from the same source population. Population controls can be identified from voter registration lists, tax rolls, drivers license lists, and telephone directories or by "random digit dialing". Population controls may also be more difficult to obtain, however, because of lack of interest in participating, and there may be recall bias, since population controls are generally healthy and may remember past exposures less accurately.

Example of a Population-based Case-Control Study: Rollison et al. reported on a "Population-based Case-Control Study of Diabetes and Breast Cancer Risk in Hispanic and Non-Hispanic White Women Living in US Southwestern States". (ALink to the article - Citation: Am J Epidemiol 2008;167:447–456).

"Briefly, a population-based case-control study of breast cancer was conducted in Colorado, New Mexico, Utah, and selected counties of Arizona. For investigation of differences in the breast cancer risk profiles of non-Hispanic Whites and Hispanics, sampling was stratified by race/ethnicity, and only women who self-reported their race as non-Hispanic White, Hispanic, or American Indian were eligible, with the exception of American Indian women living on reservations. Women diagnosed with histologically confirmed breast cancer between October 1999 and May 2004 (International Classification of Diseases for Oncology codes C50.0–C50.6 and C50.8–C50.9) were identified as cases through population-based cancer registries in each state."

"Population-based controls were frequency-matched to cases in 5-year age groups. In New Mexico and Utah, control participants under age 65 years were randomly selected from driver's license lists; in Arizona and Colorado, controls were randomly selected from commercial mailing lists, since driver's license lists were unavailable. In all states, women aged 65 years or older were randomly selected from the lists of the Centers for Medicare and Medicaid Services (Social Security lists). Of all women contacted, 68 percent of cases and 42 percent of controls participated in the study."

"Odds ratios and 95% confidence intervals were calculated using logistic regression, adjusting for age, body mass index at age 15 years, and parity. Having any type of diabetes was not associated with breast cancer overall (odds ratio = 0.94, 95% confidence interval: 0.78, 1.12). Type 2 diabetes was observed among 19% of Hispanics and 9% of non-Hispanic Whites but was not associated with breast cancer in either group."

In this example, it is clear that the controls were selected from the source population (principle 1), but less clear that they were enrolled independent of exposure status (principle 2), both because drivers' licenses were used for selection and because the participation rate among controls was low. These factors would only matter if they impacted on the estimate of the proportion of the population who had diabetes.

Hospital or Clinic Controls:

case control and cohort study methods

  • They have diseases that are unrelated to the exposure being studied. For example, for a study examining the association between smoking and lung cancer, it would not be appropriate to include patients with cardiovascular disease as control, since smoking is a risk factor for cardiovascular disease. To include such patients as controls would result in an underestimate of the true association.
  • Second, control patients in the comparison should have diseases with similar referral patterns as the cases, in order to minimize selection bias. For example, if the cases are women with cervical cancer who have been referred from all over the state, it would be inappropriate to use controls consisting of women with diabetes who had been referred primarily from local health centers in the immediate vicinity of the hospital. Similarly, it would be inappropriate to use patients from the emergency room, because the selection of a hospital for an emergency is different than for cancer, and this difference might be related to the exposure of interest.

The advantages of using controls who are patients from the same facility are:

  • They are easier to identify
  • They are more likely to participate than general population controls.
  • They minimize selection bias because they generally come from the same source population (provided referral patterns are similar).
  • Recall bias would be minimized, because they are sick, but with a different diagnosis.

Example: Several years ago the vascular surgeons at Boston Medical Center wanted to study risk factors for severe atherosclerosis of the lower extremities. The cases were patients who were referred to the hospital for elective surgery to bypass severe atherosclerotic blockages in the arteries to the legs. The controls consisted of patients who were admitted to the same hospital for elective joint replacement of the hip or knee. The patients undergoing joint replacement were similar in age and they also were following the same referral pathways. In other words, they met the "would" criterion: if one of the joint replacement surgery patients had developed severe atherosclerosis in their leg arteries, they would have been referred to the same hospital.

Friend, Neighbor, Spouse, and Relative Controls:

Occasionally investigators will ask cases to nominate controls who are in one of these categories, because they have similar characteristics, such as genotype, socioeconomic status, or environment, i.e., factors that can cause confounding, but are hard to measure and adjust for. By matching cases and controls on these factors, confounding by these factors will be controlled.   However, one must be careful that the controls satisfy the two fundamental principles. Often, they do not.

How Many Controls?

Since case-control studies are often used for uncommon outcomes, investigators often have a limited number of cases but a plentiful supply of potential controls. In this situation the statistical power of the study can be increased somewhat by enrolling more controls than cases. However, the additional power that is achieved diminishes as the ratio of controls to cases increases, and ratios greater than 4:1 have little additional impact on power. Consequently, if it is time-consuming or expensive to collect data on controls, the ratio of controls to cases should be no more than 4:1. However, if the data on controls is easily obtained, there is no reason to limit the number of controls.

Methods of Control Sampling

There are three strategies for selecting controls that are best explained by considering the nested case-control study described on page 3 of this module:

  • Survivor sampling: This is the most common method. Controls consist of individuals from the source population who do not have the outcome of interest.
  • Case-base sampling (also known as "case-cohort" sampling): Controls are selected from the population at risk at the beginning of the follow-up period in the cohort study within which the case-control study was nested.
  • Risk Set Sampling: In the nested case-control study a control would be selected from the population at risk at the point in time when a case was diagnosed.

The Rare Outcome Assumption

It is often said that an odds ratio provides a good estimate of the risk ratio only when the outcome of interest is rare, but this is only true when survivor sampling is used. With case-base sampling or risk set sampling, the odds ratio will provide a good estimate of the risk ratio regardless of the frequency of the outcome, because the controls will provide an accurate estimate of the distribution in the source population (i.e., not just in non-diseased people).

More on Selection Bias

Always consider the source population for case-control studies, i.e. the "population" that generated the cases. The cases are always identified and enrolled by some method or a set of procedures or circumstances. For example, cases with a certain disease might be referred to a particular tertiary hospital for specialized treatment. Alternatively, if there is a database or a disease registry for a geographic area, cases might be selected at random from the database. The key to avoiding selection bias is to select the controls by a similar, if not identical, mechanism in order to ensure that the controls provide an accurate representation of the exposure status of the source population.

Example 1: In the first example above, in which cases were randomly selected from a geographically defined database, the source population is also defined geographically, so it would make sense to select population controls by some random method. In contrast, if one enrolled controls from a particular hospital within the geographic area, one would have to at least consider whether the controls were inherently more or less likely to have the exposure of interest. If so, they would not provide an accurate estimate of the exposure distribution of the source population, and selection bias would result.

Example 2: In the second example above, the source population was defined by the patterns of referral to a particular hospital for a particular disease. In order for the controls to be representative of the "population" that produced those cases, the controls should be selected by a similar mechanism, e.g., by contacting the referring health care providers and asking them to provide the names of potential controls. By this mechanism, one can ensure that the controls are representative of the source population, because if they had had the disease of interest they would have been just as likely as the cases to have been included in the case group (thus fulfilling the "would" criterion).

Example 3: A food handler at a delicatessen who is infected with hepatitis A virus is responsible for an outbreak of hepatitis which is largely confined to the surrounding community from which most of the customers come. Many (but not all) of the infected cases are identified by passive and active surveillance. How should controls be selected? In this situation, one might guess that the likelihood of people going to the delicatessen would be heavily influenced by their proximity to it, and this would to a large extent define the source population. In a case-control study undertaken to identify the source, the delicatessen is one of the exposures being tested. Consequently, even if the cases were reported to the state-wide surveillance system, it would not be appropriate to randomly select controls from the state, the county, or even the town where the delicatessen is located. In other words, the "would" criterion doesn't work here, because anyone in the state with clinical hepatitis would end up in the surveillance system, but someone who lived far from the deli would have a much lower likelihood of having the exposure. A better approach would be to select controls who were matched to the cases by neighborhood, age, and gender. These controls would have similar access to go to the deli if they chose to, and they would therefore be more representative of the source population.

Analysis of Case-Control Studies

The computation and interpretation of the odds ratio in a case-control study has already been discussed in the modules on Overview of Analytic Studies and Measures of Association. Additionally, one can compute the confidence interval for the odds ratio, and statistical significance can also be evaluated by using a chi-square test (or a Fisher's Exact Test if the sample size is small) to compute a p-value. These calculations can be done using the Case-Control worksheet in the Excel file called EpiTools.XLS.

Image of the Case-Control worksheet in the Epi_Tools file

Advantages and Disadvantages of Case-Control Studies

Advantages:

  • They are efficient for rare diseases or diseases with a long latency period between exposure and disease manifestation.
  • They are less costly and less time-consuming; they are advantageous when exposure data is expensive or hard to obtain.
  • They are advantageous when studying dynamic populations in which follow-up is difficult.

Disadvantages:

  • They are subject to selection bias.
  • They are inefficient for rare exposures.
  • Information on exposure is subject to observation bias.
  • They generally do not allow calculation of incidence (absolute risk).

IMAGES

  1. Case Control Vs Cohort / Explore with an Expert ... : Cohort study vs case control study native

    case control and cohort study methods

  2. This is an easy to understand picture description of case-control designs versus prospective and

    case control and cohort study methods

  3. Differences between cross-sectional, case-control, and cohort study...

    case control and cohort study methods

  4. Case-Control Studies

    case control and cohort study methods

  5. What is the Difference Between Case Control and Cohort Study

    case control and cohort study methods

  6. Outbreak Investigations

    case control and cohort study methods

COMMENTS

  1. What Are Methods of Studying Human Behavior?

    There are several methods used in studying human behavior, such as observation, experiments, correlation studies, surveys, case studies and testing. Human behaviors manifest in many ways and are determined by culture, emotions, attitudes, v...

  2. What Is a Case Study?

    When you’re performing research as part of your job or for a school assignment, you’ll probably come across case studies that help you to learn more about the topic at hand. But what is a case study and why are they helpful? Read on to lear...

  3. Why Are Case Studies Important?

    Case studies are important because they help make something being discussed more realistic for both teachers and learners. Case studies help students to see that what they have learned is not purely theoretical but instead can serve to crea...

  4. Case-control and Cohort studies: A brief overview

    Case-control and cohort studies are observational studies that lie near the middle of the hierarchy of evidence. These types of studies, along

  5. cohort, cross sectional, and case-control studies

    This contrasts with case-control studies as they assess only one outcome variable (that is, whatever outcome the cases have entered the study with). Cohorts

  6. Cohort and case-control studies

    The cohort study starts with the putative cause of disease, and observes the occurrence of disease relative to the hypothesized causal agent, while the case-

  7. Observational Studies: Cohort and Case-Control Studies

    Cohort studies and case-control studies are two primary types of observational studies that aid in evaluating associations between diseases and exposures. In

  8. 6.3

    A case-cohort study is similar to a nested case-control study in that the cases and non-cases are within a parent cohort; cases and non-cases are identified at

  9. Case control and cohort studies

    Case control and cohort studies are examples of epidemiological studies used in public health to understand the relationship between

  10. Introduction to Matching in Case-Control and Cohort Studies

    This technique is used to increase the statistical efficiency and cost efficiency of studies. In case-control studies, besides time in risk set sampling

  11. An Introduction to the Fundamentals of Cohort and Case–Control

    For cohort studies, the drug–outcome association is usually expressed as a relative risk, a relative rate, or a hazard ratio. Advanced statistical techniques

  12. Selecting the appropriate study design: Case–control and cohort

    The case–control studies start from the disease status and compare the exposure to the risk factor(s) between the diseased (cases) and the not

  13. Analyzing the Relationship between Cohort and Case-Control Study

    In case-control studies, the potential relationship between a suspected risk factor or an attribute and the disease is examined by comparing

  14. Case-Control Studies

    The goal of a case-control study is the same as that of cohort studies, i.e., to estimate the magnitude of association between an exposure and an outcome.