1 Introduction
A major aim of epidemiologic research is to measure disease occurrence in relation to various characteristics such as exposure to environmental, occupational, or lifestyle risk factors, genetic traits or other features. The generic term exposure will be used throughout this chapter to denote these characteristics. We will start with reviewing various measures that are at the root of quantitative epidemiologic thinking. These include measures that quantify disease occurrence, associations between disease occurrence and exposures as well as their consequences in terms of disease risk (Section 2). Emphasis will be placed on measures based on occurrence of new disease cases, referred to as disease incidence. Measures based on disease prevalence, i.e., considering previously existing disease cases as well as new cases, will be mentioned only in passing.
In Section 3–7, we will focus on the main measure of impact at the population level, namely attributable risk. This measure will be introduced in some detail in Section 3. Then, we will successively review three specific problems regarding attributable risk. First, we will consider adjusted attributable risk estimation from epidemiologic study data in Section 4, an issue that has generated intensive methodological research in the last 20 years, resulting in essentially satisfactory solutions. Second, we will discuss the lack of additivity of attributable risk contributions for separate exposures and present a possible solution in Section 5. Third, we will examine conceptual issues involved in interpreting attributable risk estimates in Section 6. Final remarks will follow in Section 7.
2 Rates, risks and measures of association
2.1 Incidence and hazard rates
The incidence rate of a given disease is the number of persons who develop the disease (number of incident cases) among subjects at risk of developing the disease in the source population over a defined period of time or age. Incidence rates are not interpretable as probabilities. While they have a lower bound of zero, they have no upper bound. Units of incidence rates are reciprocal of person-time, such as reciprocals of person-years or multiples of person-years (e.g., 100 000 person-years). For instance, if five cases develop from the follow-up of 50 subjects and for a total follow-up time of two years per subject, the incidence rate is
Synonyms for incidence rate are average incidence rate, force of morbidity, person-time rate, or incidence density [1], the last term reflecting the interpretation of an incidence rate as the density of incident case occurrences in an accumulated amount of person-time [2]. Mortality rates (overall or cause-specific) can be regarded as a special case of incidence rates, the outcome considered being death rather than disease occurrence.
Incidence rates can be regarded as estimates of a limiting theoretical quantity, namely the hazard rate,
(1) |
Strictly speaking, incidence and hazard rates do not coincide. Hazard rates are formally defined as theoretical functions of time, whereas incidence rates are defined directly as estimates and constitute valid estimates of hazard rates under certain assumptions (see above).
From the definitions above, it ensues that individual follow-up data are needed to obtain incidence rates or estimate hazard rates. The cohort design that incurs follow-up of subjects with various profiles of exposure is the ideal design to obtain incidence or hazard rates for various levels or profiles of exposure, i.e., exposure-specific incidence or hazard rates. In many applications, obtaining exposure-specific incidence rates is not trivial, however. Indeed, several exposures are often considered, some with several exposed levels and some continuous. Moreover, it may be necessary to account for confounders or effect-modifiers. Hence, estimation often requires modelling. Alternatively to the cohort design, in the absence of individual follow-up data, person-time at risk can be estimated as the time period width times the population size at midpoint. Such estimation makes the assumption that individuals who disappear from being at risk, either because they succumb, or because they move in or out, do so evenly across the time interval. Thus, population data such as registry data can be used to estimate incidence rates as long as an exhaustive census of incident cases can be obtained.
Case-control data pose a more difficult problem than cohort data, because case-control data alone are not sufficient to yield incidence or hazard rates. Indeed, they provide data on the distributions of exposure respectively in diseased subjects (cases) and non-diseased subjects (controls) for the disease under study, which can be used to estimate odds ratios (see Section 2.3) but are not sufficient to estimate exposure-specific incidence rates. However, it is possible to arrive at exposure-specific incidence rates from case-control data if case-control data are complemented by either follow-up data or population data, which happens for nested or population-based case-control studies. In a nested case-control study, the cases and controls are selected from a follow-up study. In a population-based case-control study, they are selected from a specified population in which an effort is made to identify all incident cases diagnosed during a fixed time interval, usually in a grouped form (i.e., number of cases and number of subjects by age group). In both situations, full information on exposure is obtained only for cases and controls. Additionally, complementary information on composite incidence (i.e., counts of events and person-time irrespective of exposure) can be sought from the follow-up or population data. By combining this information with odds ratio estimates, exposure-specific incidence rates can be obtained as has long been recognized [1,3–7] and is a consequence of the relation [6,8]:
(2) |
Finally, cross-sectional designs in which a sample of the population is assessed for both exposure and disease status cannot provide any assessment of incidence rates but instead will yield estimates of disease prevalence proportions.
Exposure-specific incidence and hazard rates play a central role in quantitative epidemiology because, as will be apparent from the following sections, all measures of the disease risk, association and impact can be derived from them.
2.2 Measures of the disease risk
The disease risk is defined as the probability that an individual who is initially disease-free will develop a given disease over a specified time or age interval (e.g., one year, five years, or lifetime).
If the interval starting at time
(3) |
(4) |
By specializing the meaning of functions
Second, risk definition may account or not for individual exposure profiles. If no risk factors are considered to estimate the disease hazard, the corresponding measure of disease risk defines the average or composite risk over the entire population that includes subjects with various exposure profiles. This measure, also called cumulative incidence [1], may be of value at the population level. However, the main usefulness of risk is in quantifying an individual's predicted probability of developing disease depending on the individual's exposure profile. Thus, estimates of exposure-specific disease hazard have to be available for such exposure-specific risk (also called individualized or absolute risk) to be estimated.
Third, the consideration of competing risks and the corresponding definition of the survival function
From the definition of the disease risk above, it appears that it depends on the incidence rate of disease in the population considered and can also be influenced by the strength of the relationship between exposures and disease, if individual risk is considered. One consequence is that risk estimates may not be portable from one population to another, as incidence rates may vary widely among populations that are separated in time and location or even among subgroups of populations, possibly because of differing genetic patterns or differing exposure to unknown risk factors. Additionally, competing causes of death (competing risks) may also have different patterns among different populations, which might also influence the values of the disease risk.
The disease risk is a probability and therefore lies between 0 and 1, and is dimensionless. A value of 0, while theoretically possible, would correspond to very special cases such as a purely genetic disease for an individual not carrying the disease gene. A value of 1 would be even more unusual and might again correspond to a genetic disease with a penetrance of 1 for a gene carrier but, even in this case, the value should be less than 1 if competing risks are accounted for.
Beside the term ‘disease risk’, ‘absolute risk’ or ‘absolute cause-specific risk’ have been used by several authors [10–14]. Alternative terms include ‘individualized risk’ [8], ‘individual risk’ [15], ‘crude probability’ [16], ‘crude incidence’ [17], ‘cumulative incidence’ [1,18], ‘cumulative incidence risk’ [6], and ‘absolute incidence risk’ [1]. The term ‘cumulative risk’ refers to the quantity
Upon taking individual exposure profiles into account, resulting individual disease risk estimates are useful in providing an individual measure of the probability of disease occurrence, and can therefore be useful in counselling (e.g., in breast cancer, see [8,21–23]). Individual risk is also useful in designing (i.e., for sample size calculations and definition of eligibility criteria) and interpreting trials of interventions to prevent the occurrence of a disease through a risk–benefit analysis [24]. The concept of risk is also useful in clinical epidemiology as a measure of the individualized probability of an adverse event, such as a recurrence or death in diseased subjects. In that context, it can serve as a useful tool to help define individual patient management and, for instance, the absolute risk of recurrence in the next three years might be an important element in deciding whether to prescribe an aggressive and potentially toxic treatment regimen [11,17].
As is evident from its definition, the disease risk can only be estimated and interpreted in reference to a specified age or time interval. One might be interested in short time spans (e.g., five years), or long time spans (e.g., 30 years). Of course, the disease risk increases as the time span increases. Sometimes, the time span is variable such as in lifetime risk. The disease risk can be influenced strongly by the intensity of competing risks (typically competing causes of death, see above). It varies inversely as a function of death rates from other causes.
It follows from its definition that the disease risk is estimable as long as hazard rates for the disease of interest are estimable. Therefore, it is directly estimable from cohort data, but case-control data have to be complemented with follow-up or population data in order to obtain the necessary complementary information on incidence rates (see Section 2.1).
Interpretation, usefulness, and properties of the disease risk, as well as methods for its estimation from cohort data, population-based or nested case-control data have been reviewed in detail [10].
2.3 Measures of association
Measures of association assess the strength of associations between one or several exposures and the risk of developing a given disease. Thus, they are useful in aetiologic research to assess and quantify associations between potential risk (or protective) factors and disease risk. The question addressed is whether and to what degree a given exposure is associated with the occurrence of the disease of interest. In fact, this is the primary question that most epidemiologic studies are trying to answer.
Depending on the available data, measures of association may be based on disease rates, disease risks, or even disease odds, i.e.,
Measures of association can be defined for categorical or continuous exposures. For categorical exposures, any two exposure level can be contrasted using the measures of association defined below. However, it is convenient to define a reference level to which any exposure level can be contrasted. This choice is sometimes natural (e.g., non-smokers in assessing the association of smoking with disease occurrence), but can be more problematic if the exposure considered is of continuous nature, where a range of low exposures may be considered potentially inconsequential. The choice of a reference range is important for interpreting results. It should be wide enough for estimates of measures of association to be reasonably precise. However, it should not be so wide that it compromises meaningful interpretation of the results, which depend critically on the homogeneity of the reference level. For continuous exposures, measures of association can also be expressed per unit of exposure, e.g., for each additional gram of daily alcohol consumption. The reference level may then be a precise value such as no daily alcohol consumption or a range of values such as less than 10 grams of daily alcohol consumption.
When computing a measure of association, it is usually assumed that the relationship being captured has the potential to be causal, and efforts are taken to remove the impact of confounders from the quantity. Nonetheless, except for the special case of randomized studies, most investigators retain the word ‘association’ rather than ‘effect’ when describing the relationship between exposure and outcome to emphasize the possibility that unknown confounders may still influence the relationship.
Ratio-based measures of association are particularly appropriate when the effect of the exposure is multiplicative, which means that there is a similar percent increase or decrease associated with exposure in rate, risk or odds across exposure subgroups. Effects have often been observed to be multiplicative, leading to ratios providing a simple description of the association (e.g., see [25 (Chapter 2)]). Ratio measures are dimensionless and range from 0 to infinity, with 1 designating no association of the exposure with the outcome. When the outcome is death or disease, and the ratio has the rate, risk, or odds of the outcome with the exposed group in the numerator, a value less than 1 indicates a protective effect of exposure. The exposure is then referred to as a protective factor. When the ratio in this set-up is greater than 1, there is greater disease occurrence with exposure, and the exposure is then referred to as a risk factor.
The rate ratio is the ratio between the rate of disease among those exposed and those not exposed or
Rate ratios refer to population dynamics, and are not as easily interpretable on the individual level. It has been argued, however, that rate ratios make more sense than risk ratios when the period subjects are at risk is longer than the observation period [26 (Chapter 8)]. Numerically, the rate ratio is further from the null than the risk ratio. When rates are low, the similarity of risks and rates leads to rate ratios being close to risk ratios, as discussed below. Further considerations of how the rate ratio relates to other ratio-based measures of association are offered by Rothman and Greenland [20 (p. 50)].
The risk ratio, relative risk or ratio of risks of disease among those exposed
For several reasons, the odds ratio has emerged as the most popular measure of association. The odds ratio among those exposed and not exposed is the ratio of odds,
It can be shown that numerically the odds ratio falls the furthest from the null, and the risk ratio the closest, with the rate ratio in between. For example, from Table 1, based on fictitious data from a cohort study for a disease that is not rare, we would obtain a risk ratio
Data from the fictitious cohort study
Exposed | Unexposed | |
Diseased | 40 | 20 |
Non-diseased | 60 | 80 |
The difference in magnitude between the above ratio measures is important to keep in mind when interpreting them for diseases or outcomes that are not rare. For rare outcomes, the values of the three ratio measures tend to be close.
Difference-based measures are appropriate when effects are additive (e.g., see [25 (Chapter 2)]), which means that the exposure leads to a similar absolute increase or decrease in rate or risk across subgroups. Although additive relationships may be less common in practice, difference measures may be more understandable to the public when the outcome is rare, and relate directly to measures of impact discussed in Section 6.
The numerical ranges of difference measures depend on their component parts. The rate difference ranges from minus to plus infinity, while the risk difference is bounded between minus and plus one. The situation of no association is reflected by a difference measure of zero. When the measure is formed as the rate or risk among the exposed minus that among the non-exposed, a positive value indicates that the exposure is a risk factor, while a negative value indicates that it is a protective factor. It can be shown that the risk difference falls numerically nearer to the null than the rate difference does. For example, Table 1 yields a risk difference of
The rate difference for exposed an unexposed subjects is defined as
For the special case of a dichotomous exposure, the rate difference, i.e., the difference between the incidence rates in the exposed and unexposed subjects has been termed ‘excess incidence’ [19,27,28], ‘excess risk’ [29], ‘Berkson's simple difference’ [30], ‘incidence density difference’ [1], or even ‘attributable risk’ [29,31], which may have caused some confusion.
The risk difference
Because exposure-specific incidence rates and risks can be obtained from cohort data, all measures of association considered (based on ratios or differences) can be obtained as well. This is also true of case-control data complemented by follow-up or population data (see Sections 2.1 and 2.2). Case-control data alone allow estimation of odds ratios thanks to the identity between disease and exposure odds ratios, which extends to the logistic regression framework. Prentice and Pyke [32] showed that the unconditional logistic model (see also [25 (Chapter 6)]) applies to case-control data as long as the intercept is disregarded. Interestingly, time-matched case-control studies allow estimation of hazard rates (e.g., see [1,33,34]).
Measures of association have a long history as methods for estimation and statistical inference. Traditional methods adjust for confounders by direct or indirect standardization of the rates or risks involved, prior to computation of the measure of association, or by stratification, where association measures are computed separately for subgroups and then combined. For measures based on the difference of rates or risks, direct standardization and stratification can be identical, if the same weights are chosen [35]. Generally, however, direct standardization uses predetermined weights chosen for external validity, while optimal or efficient weights are chosen with stratification. Efficient weights make the standard error of the combined estimator as small as possible.
In modern epidemiology, measures of association are most often estimated from regression analysis. Regression adjustment is a form of stratification, which provides more flexibility, but most often relies on large sample size for inference. The function applied to the rate or risk in a regression analysis is referred to as the link function in the framework of generalized linear models underlying such analyses (see [36,37] for theory and practical application). For example, linear regression would regress the risk or rate directly on exposure without any transformation, which is referred to as using the identity link. When the exposure is the only predictor in such a model, all link functions fit equally well and simply represent different ways to characterize the association. However, when several exposures or confounders are involved, or if the exposure is measured as a continuous or ordinal variable, some link functions and not others may require interaction or non-linear terms to improve the fit. Most widely used regression models are the Poisson and Cox models for rate ratio estimation from cohort data, the log linear model for risk ratio estimation from cohort data, the logistic regression model for odds ratio estimation from cohort or case-control data.
Measures of association based on prevalence parallel those for risk (for point prevalence) or incidence rates (for period prevalence). For example, one can form prevalence ratios, prevalence differences and prevalence odds ratios. They can be estimated from cross-sectional data. These measures are less useful for studying the aetiology of a disease than measures based on incidence. The reason for this is that prevalence reflects both incidence and duration of disease. For a potentially fatal or incurable disease, duration means survival and the exposures that increase incidence may reduce or increase survival, and hence the association of an exposure with prevalence may be very different from its association with incidence.
Measures of associations and related methods of inference are reviewed at length in epidemiologic textbooks (e.g., [20,25,26,29,38–43]).
3 Measures of impact: attributable risk
Measures of impact are used to assess the contribution of one or several exposures to the occurrence of incident cases at the population level. Thus, they are useful in public health to weigh the impact of exposure on the burden of disease occurrence and assess potential prevention programmes aimed at reducing or eliminating exposure in the population. The most commonly used measure of impact is the attributable risk.
The term ‘attributable risk’ (AR) was initially introduced by Levin in 1953 [44] as a measure to quantify the impact of smoking on lung cancer occurrence. Gradually, it has become a widely used measure to assess the consequences of an association between an exposure factor and a disease at the population level. It is defined as the following ratio:
(5) |
Unlike measures of association (see Section 2.3), AR depends both on the strength of the association between exposure and disease and the prevalence of exposure in the population,
(6) |
An alternative formulation underscores this joint dependency in yet another manner. Upon using the same decomposition of
(7) |
A high relative risk can correspond to a low or high AR, depending on the prevalence of exposure, which leads to widely different public-health consequences. One implication is that portability is not a usual property of AR, as the prevalence of exposure may vary widely among populations that are separated in time or location. This is in contrast with measures of association such as the relative risk or rate ratio, which are more portable from one population to another, as the strength of the association between disease and exposure might vary little among populations, unless strong interactions with environmental or genetic factors are present.
When the exposure considered is a risk factor
AR takes a null value when either there is no association between exposure and disease
Some confusion in the terminology arises from the reported use of as many as 16 different terms in the literature to denote the attributable risk [50,51]. However, a literature search by Uter and Pfahlberg [52] found some consistency in terminology usage, with ‘attributable risk’ and ‘population attributable risk’ [19] being the most commonly used terms, by far followed by ‘etiologic fraction’ [6]. Other popular terms include ‘attributable risk percentage’ [45], ‘fraction of aetiology’ [6], and ‘attributable fraction’ [20 (Chapter 4), 48,53,54].
Moreover, additional confusion may originate in the use by some authors [19,29,31] of the term ‘attributable risk’ to denote a measure of association, the excess incidence, that is the difference between the incidence rates in exposed and unexposed subjects (see Section 2.3). Context will usually help the readers to detect this less common use.
While measures of association such as the rate ratio and relative risk are used to establish an association in aetiologic research, AR has a public-health interpretation as a measure of the disease burden attributable or at least related to one or several exposures. Consequently, AR is used to assess the potential impact of prevention programmes aimed at eliminating exposure from the population. It is often thought of as the fraction of disease that could be eliminated if exposure could be totally removed from the population.
However, this interpretation can be misleading because, for it to be strictly correct, the three following conditions have to be met [30]. First, estimation of AR has to be unbiased (see Section 4). Second, exposure has to be causal rather than merely associated with the disease. Third, elimination of exposure has to be without any effect on the distribution of other risk factors. Indeed, as it might be difficult to alter the level of exposure to one factor independently of other risk factors, the resulting change in disease load might be different from the AR estimate. For these reasons, various authors elect to use weaker definitions of AR, such as the proportion of disease that can be related or linked, rather than attributable, to exposure [6].
Several authors have considered an interpretation of AR in terms of aetiologic research. The argument is that if an AR estimate is available for several risk factors jointly, then its complement to 1, i.e.,
AR can be estimated from cohort studies since all quantities in Eqs. (5)–(7) are directly estimable from cohort studies. AR estimates can differ depending on whether rate ratios, risk ratios or odds ratios are used, but will be numerically close for rare diseases. For case-control studies, exposure-specific incidence rates or risks are not available, unless data are complemented with follow-up or population-based data (see Sections 2.1 and 2.2). Thus, one has to rely on odds ratio estimates, use Eq. (6) and estimate
Beside AR, other measures of impact have been proposed, notably the generalized impact fraction and the number of person-years (or potential years) of life lost. The generalized impact fraction broadens the concept of AR and is obtained by replacing the term
4 Adjusted attributable risk estimation
As it is the case for measures of association, unadjusted (or crude or marginal) AR estimates may be inconsistent [6,30,66,70]. The precise conditions under which adjusted AR estimates that take into account the distribution and effect of other factors will differ from unadjusted AR estimates that fail to do so were worked out by Walter [66]. If E and X are two dichotomous factors taking levels 0 and 1, and if one is interested in estimating the AR for exposure E, then the following applies. The adjusted and unadjusted AR estimates coincide (i.e., the crude AR estimate is unbiased) if and only if (a) E and X are such that
The extent of bias varies according to the severity of the departure from conditions (a) and (b) above. Although no systematic numerical study of the bias of unadjusted AR estimates has been performed, Walter [66] provided a revealing example of a case-control study assessing the association between alcohol, smoking, and oral cancer. In that study, severe positive bias was observed for crude AR estimates, with a very large difference between crude and adjusted AR estimates both for smoking (51.3% vs. 30.6%, a 20.7 difference in percentage points and 68% relative difference in AR estimates) and alcohol (52.2% vs. 37.0%, a 15.2% absolute difference and 48% relative difference). Thus, the prudent approach must be to adjust for factors that are suspected or known to act as confounders in a similar fashion as for estimating measures of associations.
Two simple adjusted estimation approaches discussed in the literature are inconsistent. The first approach was presented by Walter [30], and is based on a factorization of the crude risk ratio into two components, similar to those in Miettinen's earlier derivation [71]. In this approach, a crude AR estimate is obtained under the assumption of no association between exposure and disease (i.e., values of RR or the odds ratio are taken equal to 1 separately for each level of confounding). This term reflects the AR only due to confounding factors since it is obtained under the assumption that disease and exposure are not associated. By subtracting this term from the crude AR estimate that ignores confounding factors and thus reflects the impact of both exposure and confounding factors, what remains is an estimate of the AR for exposure adjusted for confounding [30]. The second approach is based on using Eq. (6) and plugging in a common adjusted RR estimate (odds ratio estimate in case-control studies), along with an estimate of
By contrast, two adjusted approaches based on stratification yield valid estimates. The Mantel–Haenszel approach consists in plugging-in an estimate of the common adjusted RR (odds ratio in case-control studies) and an estimate of the prevalence of exposure in diseased individuals,
The weighted-sum approach also allows adjustment for one or more polychotomous factors forming J levels or strata. The adjusted AR is written as a weighted sum over all strata of stratum-specific ARs, i.e.,
A natural alternative to generalize these approaches is to use adjustment procedures based on regression models, in order to take advantage of their flexible and unified approach to efficient parameter estimation and hypothesis testing. Regression models allow one to take into account adjustment factors as well as interaction of exposures with some or all adjustment factors. This approach was first used by Walter [30], Sturmans et al. [90] and Fleiss [91] followed by Deubner et al. [92] and Greenland [47]. The full generality and flexibility of the regression approach was exploited by Bruzzi et al. [93], who developed a general AR estimate based on rewriting AR as:
A modification of the approach by Bruzzi et al. was developed by Greenland and Drescher [97] in order to obtain full maximum likelihood estimates of AR. The modification consists in estimating the quantities
Detailed reviews of adjusted AR estimation [63,82,87,98] are available. Alternative methods to obtain estimates of variance and confidence intervals for AR have been developed either based on resampling techniques [52,87,99–102] or on quadratic equations [103–105].
5 Non-additivity of attributable risks for separate exposures
AR is frequently estimated in multifactorial situations when trying to evaluate the joint and individual impact of multiple exposures. In this context, separate ARs can be estimated for each exposure as well as the overall AR for all or several exposures jointly. This raises a problem since individual contributions of exposures to attributable risk are usually non-additive.
Indeed, Walter [70] showed that the sum of separate ARs for each exposure is not equal to the joint AR unless at least one of the two following specific conditions is fulfilled: there is no joint exposure to the different exposures in the population and the effects of the exposures on disease risk are additive. For two exposures, the latter condition means that the relative risk for exposure to the two factors,
Table 2 taken from Begg [61] illustrates this problem. It considers two dichotomous exposures
Illustration of the phenomenon of non-additivity of attributable risks for two exposures
Exposure to factor |
Exposure to factor |
Prevalence | Relative risk | Risk | Risk in the absence of factor |
Risk in the absence of factor |
Yes | Yes | 0.25 | 81 | 0.81 | 0.09 | 0.09 |
Yes | No | 0.25 | 9 | 0.09 | 0.01 | 0.09 |
No | Yes | 0.25 | 9 | 0.09 | 0.09 | 0.01 |
No | No | 0.25 | 1 | 0.01 | 0.01 | 0.01 |
The non-additivity problem comes from the fact that by forming the sum
Because non-additivity is somewhat counter-intuitive and generates misinterpretations, three alternative approaches have been suggested, one based on considering variance decomposition methods [106] rather than estimating AR, one based on estimating assigned share or probability of causation of a given exposure with relevance in litigation procedures for individuals with multiple exposures [107–113], and one based on an extension of the concept of AR [114,115]. This last approach relies on partitioning techniques [116,117] and keeps with the framework of AR estimation by introducing the sequential AR that generalizes the concept of AR. The principle is to define an order among the exposures considered. Then, the contribution of each exposure is assessed sequentially according to that order. The contribution of the first exposure considered is calculated as the standard AR for that exposure separately. The contribution of the second exposure is obtained as the difference between the joint AR estimate for the first two exposures and the separate AR estimate for the first exposure, the contribution of the third exposure is obtained as the difference between the joint AR estimates for the first three and first two exposures, etc. Thus, a multidimensional vector consisting of contributions of each separate exposure is obtained.
These contributions are meaningful in terms of potential prevention programmes that consider successive rather than simultaneous elimination of exposures from the population. Indeed, each step yields the additional contribution of the elimination of a given exposure once higher-ranked exposures are eliminated. At some point, additional contributions may become very small, indicating that there is not much point in considering extra steps. By construction, these contributions sum to the overall AR for all exposures jointly, which constitutes an appealing property. Of course, separate vectors of contributions are obtained for different orders. Meaningful orders depend on practical possibilities in implementing potential prevention programmes in a given population. Average contributions can be calculated for each given step (i.e., the first step, second step, etc.) by calculating the mean of contributions corresponding to that step over all possible orders. These average contributions have been termed partial ARs [114], and they represent another potentially useful measure.
For the data in Table 2, sequential ARs would be equal to 80% and
Methods for visualizing sequential and partial ARs are provided by Eide and Heuch [118]. An illustration is given by Fig. 1 based on data from the case-control study of oesophageal cancer conducted in the Ille-et-Vilaine district of France. This study included 200 cases and 775 controls selected by simple random sampling from electoral lists [119]. The assessment of associations between alcohol consumption and smoking with oesophageal cancer has been the focus of detailed illustration by Breslow and Day [25], who presented various approaches to odds ratio estimation. Upon considering 0–39 g/day as the reference category for alcohol consumption, 29 cases and 386 controls were in the reference category, while 171 cases and 389 controls were in the exposed (i.e.,

Sequential attributable risk estimates for elevated alcohol consumption (40+g/day) and heavy smoking (10+g/day) for two different orders of removal (top panel (a): alcohol, then smoking; bottom panel (b): smoking, then alcohol) – Case-control data on oesophageal cancer [119].
Hence, considering the first order of risk factor removal (i.e., eliminating alcohol consumption above 39 g/day followed by eliminating smoking above 9 g/day) yields sequential AR estimates of 70.9% for elevated daily alcohol consumption and
A detailed review of properties, interpretation, and variants of sequential and partial ARs was provided by Land et al. [115].
6 Conceptual issues in estimating attributable risk
As first pointed out by Greenland and Robins [53], there are three distinct measures within the concept of AR. It is easier to make this point using the formulation of AR in Eq. (7) and considering the attributable risk in the exposed individuals. In Eq. (7), AR is expressed as the product of two terms, the prevalence of exposure in diseased individuals,
(8) |
Greenland and Robins [53] proposed to distinguish three separate measures of impact that are conceptually different. They made these distinctions for
The third AR measure is defined as the proportion of disease cases in which exposure played an aetiologic role either by contributing to disease occurrence through making the case's incidence time earlier that it would have been in the absence of exposure (i.e., disease would have occurred in the absence of exposure, only later) or by causing disease occurrence (i.e., disease would not have occurred in the absence of exposure). It is the AR counterpart to the ‘aetiologic fraction’ definition of
7 Final remarks
Disease frequency is measured through the computation of incidence rates or estimation of disease risk. Both measures are directly accessible from cohort data. They can be obtained from case-control data only if they are complemented by follow-up or population data. Using regression techniques, methods are available to derive incidence rates or risk estimates specific to a given exposure profile. Exposure-specific risk estimates are useful in individual prediction. A wide variety of options and techniques are available for measuring association. Adjustment for confounding is a key point in all analyses of observational studies, and can be pursued by standardization, stratification, and by regression techniques. The flexibility of the latter, especially in the generalized linear model framework, and the availability of computer software, has made it widely applied in the last several years.
Several measures are available to assess the impact of an exposure in terms of the occurrence of new disease cases at the population level, among which AR is the most commonly used. Several approaches have been developed to derive adjusted AR estimates from case-control as well as cohort data, either based on stratification or on more flexible regression techniques. Sequential and partial ARs have been proposed to handle the situation of multiple exposures and circumvent the associated non-additivity problem. Although there remain issues in properly interpreting the concept of AR, AR remains a useful measure to assess the potential impact of exposure at the population level and can serve as a suitable guide in practice to assess and compare various prevention strategies.
General problems of AR definition, interpretation and usefulness as well as properties have been reviewed in detail [6,30,51,62,122,123]. Special issues were reviewed by Benichou [62,63]. They include estimation of AR for risk factors with multiple levels of exposure or with a continuous form, multiple risk factors, recurrent disease events, and disease classification with more than two categories. They also include assessing the consequences of exposure misclassification on AR estimates. Specific software for attributable risk estimation [100,124,125] as well as a simplified approach to confidence interval estimation [126] has been developed to facilitate implementation of methods for attributable risk estimation. Finally, much remains to be done to promote proper use and interpretation of AR, as illustrated in a recent literature review [127].
Vous devez vous connecter pour continuer.
S'authentifier