Testing Gateway Theory: do cigarette prices affect illicit drug use?

Journal of Health Economics 21 (2002) 679–698 

Testing Gateway Theory: do cigarette prices affect 

illicit drug use? 

Michael Beenstock a,∗ , Giora Rahav b 

a Department of Economics, Hebrew University of Jerusalem, Jerusalem, Israel 

b Department of Sociology, Tel Aviv University, Tel Aviv, Israel 

Received 18 February 2000; received in revised form 26 July 2001; accepted 7 February 2002 

Abstract 

We test the causal Gateway Theory of drug use dynamics by way of a natural experiment. We 

randomize cigarette smoking by birth cohort and cigarette prices. We use data for Israel to show that 

while cigarette smoking causes cannabis use, the evidence that cannabis use causes hard drug use is 

much weaker. These results are based on various econometric methodologies including two-stage 

logit (2SL), bivariate probit, and frailty analysis for survival data. © 2002 Elsevier Science B.V. All 

rights reserved. 

Keywords: Gateway Theory; Drug addiction; Treatment effects; Natural experimentation 

1. Introduction 

The economic analysis of illicit drugs has quite a long history (e.g. Rottenberg, 1968; 

Nisbet and Vakil, 1972; Moore, 1973, and White and Lusketich, 1983). However, following 

the pioneering work of Becker and Murphy (1988) economists have become increasingly 

interested in addictive consumer behaviour. Much of this interest has focussed upon the determinants 

of cigarette smoking (e.g. Chaloupka, 1991), alcohol (e.g. Mullahy and Sindelar, 

1993), gambling and illicit drugs (e.g. Grossman and Chaloupka, 1999). Economists have 

also contributed to the policy debate concerning whether drugs should be legalized or decriminalized 

(Prinz, 1997; Frey, 1997), and they have made their contribution to the interface 

between drugs and crime (e.g. Model, 1991 and Miron and Zweibel, 1991, 1995). In this 

paper, we open a new research front for economists, namely econometric investigation 

of Gateway Theory which until now has been monopolized by non-economists, including 

epidemiologists, sociologists, psychiatrists and others. 

∗ Corresponding author. Tel.: +972-2-5883-120; fax: +972-2-581-6071. 

E-mail address: msbin@mscc.huji.ac.il (M. Beenstock). 

0167-6296/02/$ – see front matter © 2002 Elsevier Science B.V. All rights reserved. 

PII: S0167-6296(02)00009-7

680 M. Beenstock, G. Rahav / Journal of Health Economics 21 (2002) 679–698 

Gateway Theory was originally developed in the 1970s by Kandel (1975). She observed 

that there is a systematic sequencing in the use of psychoactive substances which runs from 

alcohol and cigarettes, then to cannabis, and finally to “hard” drugs such as cocaine, heroin 

and LSD. Cigarettes are a “gateway” to cannabis, which in turn is a “gateway” to hard drugs. 

Of course not all cigarette smokers go on to use cannabis, and it should be noted that not 

all cannabis consumers first smoked cigarettes. Nevertheless, cigarette smokers are more 

likely to use cannabis subsequently than non-smokers. A similar juxtaposition applies to 

cannabis and hard drugs; cannabis users are more likely to use hard drugs eventually than 

non-users of cannabis, but not all hard drug consumers used cannabis first. 

The scientific literature on Gateway Theory is too vast to be reviewed here. However, 

it splits into two broad camps. One camp (e.g. O’Donnell and Clayton, 1982) regards the 

gateway effect to be causal or generative. According to this view cigarette smoking induces 

cannabis use, and cannabis use induces hard drug consumption. This implies that if smoking 

were restricted there would be less use of cannabis. It also implies that if cannabis were 

legalized there would be more use of hard drugs. It is obvious how this causal interpretation 

of Gateway Theory has been grist to the mill of the anti liberalization lobby. The rival camp 

(e.g. Baumrind, 1983) views the gateway effect to be merely predictive or even descriptive. 

Economists will recognize this as Granger causality; due to systematic sequencing cigarette 

consumption may help predict cannabis consumption, and cannabis consumption may help 

predict hard drug consumption. However, Granger causality does not imply causality itself 

and has no implications for policy. 

The longstanding debate about Gateway Theory revolves around the identification problem. 

Does the fact that cigarette smokers are more likely to go onto use cannabis result 

from unobserved heterogeneity, i.e. people with a greater susceptibility to smoke cigarettes 

also have a greater susceptibility to consume cannabis, or does it result from a treatment 

effect, i.e. exposure to cigarettes (the treatment) induces cannabis use (the outcome)? The 

vast number of empirical papers on Gateway Theory have not resolved the identification 

problem. 

One way to resolve this identification problem is to apply the methodology of natural 

experimentation, which in the present context seeks to randomize cigarette smoking so that 

its causal effect on subsequent use of cannabis is identified. We use the data for Israel to 

apply the methodology, and follow Evans and Ringel (1999) who study the effect of smoking 

on birth weights by using cigarette price data to randomize smoking behavior. They use 

cross section data and randomize cigarette smoking by exploiting differential tax rates on 

cigarettes in the US. Pacula (1998) too uses cross section data to identify the gateway effect 

of alcohol by exploiting differential tax rates on alcohol in the US. 

Natural experiments in econometrics usually exploit cross section differences in instrumental 

variables. A methodological innovation in our approach to natural experimentation 

consists of exploiting time series data rather than cross section data. We do so because 

there are no cross sectional or geographical differences in cigarette prices (or other potential 

instruments) in Israel. However, there has been extensive variation in the real price of 

cigarettes over time. People who grew up when cigarettes were cheap are more likely to 

smoke than people who grew up when they were expensive. Because individuals do not 

chose their year of birth or the price of cigarettes and other variables we have the basis of 

a natural experiment.

M. Beenstock, G. Rahav / Journal of Health Economics 21 (2002) 679–698 681 

The remainder of the paper is organized as follows. In Section 2, we describe the methodology 

that we use for testing Gateway Theory. In Section 3 we describe the data which come 

from nationwide surveys of Jews in Israel conducted in 1989, 1992 and 1995. In Section 4, 

we perform two tests of Gateway Theory. In the first test we focus on sequencing and investigate 

whether smoking eventually induces cannabis use, and whether the latter eventually 

induces use of hard drugs. In the second, we focus on timing and investigate whether earlier 

initiation of smoking induces earlier initiation of cannabis use. The latter attaches importance 

to timing of the gateway effect, whereas the former does not. Our main conclusions 

for Israel are that cigarette smoking causally increases the likelihood of subsequently using 

cannabis, but cannabis use does not causally increase the likelihood of subsequently using 

hard drugs. 

2. Methodology 

2.1. Linear probability 

The first hypothesis of interest is does cigarette smoking increase the probability of 

subsequently using cannabis? Define C n = 1, if individual n smoked cigarettes and 0 

otherwise, and S n = 1, if individual n subsequently used cannabis and 0 otherwise. We 

exclude those who consumed cannabis either prior to smoking cigarettes or who did so 

without smoking. These exceptions have troubled gateway theorists, but they are not relevant 

for testing the first hypothesis, which does not claim that cigarettes are a precondition for 

cannabis. 

Although in Section 4 we do not apply the linear probability model, we use it here 

to illustrate the methodological issues involved. We use time subscripts (t) to indicate 

sequencing of drug use, and to remind us, e.g. that C occurs before S. Gateway Theory 

suggests: 

S nt = αX nt + βC n(t−1) + γ y D y + u nt (1) 

where X is a vector of controls including personal characteristics, D y the birth cohort indicator 

for those born in year y, and u an unobserved error term. Gateway Theory hypothesizes 

that β>0. By definition C t−1 is predetermined because it occurs prior to S t . However, 

this does not help identify the treatment effect of C upon S because C and u are likely to be 

positively correlated. This arises from the auxiliary model for smoking: 

C n(t−1) = φZ n(t−1) + θ y D y + v n(t−1) (2) 

where Z is a vector of controls determining smoking and v denotes the unobserved heterogeneity 

in smoking. If people with a natural susceptibility to cigarettes are also more 

susceptible to cannabis, then E(u t v t−1 )>0, in which case estimates of β will be biased 

upwards. If in this case, the estimate of β is positive and statistically significant when in 

reality β = 0, there is Granger causality but no genuine or generative causality. Prior knowledge 

of C t−1 may be used to predict S t . If in addition the estimate of β is not statistically 

significant, there is neither Granger causality nor generative causality.


The solution to this problem is to instrument C t−1 in Eq. (1). No doubt Z and X largely 

overlap. An element of Z is the price of cigarettes (and other variables discussed in Section 4) 

that prevailed when individual n was growing up (P n(t−1) ), which are hypothesized to 

determine the demand for cigarettes but not the demand for cannabis. We expect C t−1 to 

vary inversely with P t−1 . There is no reason to suspect that P t−1 and v t−1 are correlated 

(nor P t−1 and u t ). If S t does not depend upon P t−1 , then P t−1 identifies the effect of C t−1 

upon S t in Eq. (1). 

The next step in the gateway chain concerns the relationship between cannabis and 

hard drugs. The hypothesis of interest here is does exposure to cannabis induce a higher 

probability of subsequently using hard drugs? We denote H n = 1 if individual n used hard 

drugs after using cannabis and 0 otherwise. We express this as follows: 

H n(t+1) = λY n(t+1) + δS nt + ρ y D y + w n(t+1) (3) 

where Y is a vector of controls for hard drug use. Gateway Theory suggests δ>0. Since 

we suspect that E(wu) >0 unobserved heterogeneity is likely to bias upwards estimates 

of δ. Ideally, we seek exclusion restrictions for Y and X such as drug price data, which are 

not available in Israel. Instead, we suggest using as an instrument for S t in Eq. (3) its fitted 

value from IV estimation of Eq. (1). We refer to this as “domino” identification because 

the natural experiment is indirect. Clearly, such indirect identification weakens the power 

of tests on the value of δ. Random exposure to cigarettes at time t − 1 induces cannabis 

consumption at time t, which in turn, induces hard drug use at time t + 1. If β and δ are 

properly identified, and there is a causal gateway effect, we can expect a chain reaction 

where raising the price of cigarettes will reduce cigarette smoking, which will reduce the 

subsequent use of cannabis, which, in turn, will reduce the subsequent use of hard drugs. 

Note that Eqs. (1)–(3) control for cohort effects. This is possible because we use several 

surveys in Section 4 (y = year of survey − age). Had there been only one survey it would 

not have been possible to identify the separate effects of P t−1 and birth cohort upon drug 

consumption. 

2.2. Recursive bivariate probit and two-stage procedures 

If u and v in Eqs. (1) and (2) happen to be bivariate normal with cov(u, v) = ρ then 

the model may be estimated by maximum likelihood as a recursive bivariate probit (RBP) 

model (Maddala, 1983, pp. 122–123, Greene, 2000, pp. 852–825). Note that in this case C in 

Eq. (1) does not have to be instrumented because, in contrast to the linear probability model, 

it is not estimated by least squares. However, Maddala points out that identification requires 

that X omit covariates in Z.IfX = Z the model is not identified, even parametrically. 

Maddala suggests that an alternative procedure is to specify prob (C ∗ > 0) in Eq. (1) 

instead of C, where C ∗ denotes the underlying latent variable that measures the propensity 

to smoke cigarettes. In this case, it may be shown that a two-stage procedure provides 

consistent estimates of the parameters in Eqs. (1) and (2). In the first stage, Eq. (2) is 

estimated by probit or logit, and in the second stage the predicted probability of C obtained 

from the first stage is used to replace C in Eq. (1). 

There are advantages and disadvantages to both procedures. The main disadvantage of 

RBP is that it may be sensitive to parametric assumptions about the unobserved heterogeneity.


If u and v do not happen to be bivariate normal then estimates of β in Eq. (1) will be biased 

and inconsistent (unless ρ = 0). This problem does not arise in the two-stage procedure 

because it does not require estimates of ρ. The disadvantage of the two-stage procedure is 

that although it is consistent, it is not efficient. By contrast, RBP estimates are consistent 

and efficient provided that u and v happen to be bivariate normal. 

In Section 4, we use both procedures. 1 However, the main results that we report are for 

two-stage logit (2SL), mainly because it is less parametric, and is therefore less sensitive 

to mis-specification error. A disadvantage of the two-stage procedure is that the standard 

errors of the parameters are difficult to calculate (Maddala, 1983, p. 247). In Section 4, we 

use a bootstrap procedure to calculate them. 

2.3. Hazard analysis 

In this section, we focus on the timing of events rather than their sequencing. The specific 

questions we ask are whether earlier initiation of cigarette smoking causes earlier initiation 

of cannabis, and whether the latter causes earlier initiation of hard drugs. Elsewhere 

(Beenstock and Rahav, 2001), we suggest the use of long term survivor models to specify 

the initiation hazard since drug use is a minority activity. This is also the approach used 

by Douglas and Hariharan (1994). However, this approach is not practical in the current, 

more complicated, context in which our objective is to identify treatment effects. Instead, 

we model drug use initiation using Cox’s proportional hazards model (CPHM), which has 

the advantage of not necessarily implying that it is a matter of time before everyone uses 

drugs. 2 

The counterpart of Eq. (1) for individual n’s hazard of using cannabis ( s ) is: 

λ s (t n ) = λ s0 (t n ) exp − (X n α + βA cn + γ y D yn ) (4) 

where A cn denotes the age at which individual n first smoked cigarettes, X and D are defined 

as in Eq. (1), and s0 is the “baseline” hazard. We denote the age of cannabis initiation by 

A s >A c by definition. According to Gateway Theory β > 0, since people who begin 

smoking later will initiate cannabis later. Unobserved heterogeneity is likely to generate 

positive covariance between A s and A c , which will bias upwards estimates of β. We suggest 

that instead of using A c in Eq. (4) it should be replaced by its expected value as determined 

from a CPHM for cigarette smoking. The specification of the hazard for cigarette smoking 

(λ c ) parallels Eq. (2) and is written as follows: 

λ c (t n ) = λ c0 (t n ) exp − (Z n φ + θ y D yn ) (5) 

where Z is defined as in Eq. (2). Eq. (5) implies that expected age at cigarette initiation is: 

E(A cn ) = exp[−Λ c (t) exp(Z n φ + θ y D yn )] (6) 

where Λ c denotes the integrated hazard function for smoking evaluated at the mean. We 

suggest a two-stage procedure in which Eq. (5) is estimated in the first stage and the solution 

1 Evans and Ringel (1999) in a similar situation to ours, side-step these methodological issues. 

2 Larson and Dinse (1985) propose a long term survivor model for CPHM which is used in Beenstock and Rahav 

(2001).


to Eq. (6) is used to replace A c in Eq. (4) in the second stage. The parameter standard errors in 

the second stage are bootstrapped. Having thus estimated Eq. (4), we may use it to calculate 

E(A s ) and to replace A s in a CPHM for hard drugs. 

In the bio-statistical literature this class of dependent hazard model is known as a “frailty” 

model (Hougaard, 1995; Klein and Moeschberger, 1997). Frailty models usually consist of 

two hazards functions that happen to be stochastically related. If u denotes the stochastic 

component of the empirical hazard function for cigarette initiation, and v denotes its counterpart 

for cannabis initiation, then frailty models estimate the parameter ρ = cov(u, v). 

However, they do not usually specify A c in the hazard function for cannabis. Here, we 

specify the dependence between the two hazard functions directly via β in Eq. (1), as in the 

two-stage procedure that we have described. A more general specification would estimate 

both ρ as well as β in Eq. (4), i.e. frailty may be transmitted directly via β and indirectly 

via ρ. 3 

3. The data 

3.1. The surveys 

The data to be used in the present analysis are from three epidemiological surveys carried 

out in Israel by the Israel Anti-Drug Authority in 1989 (N = 5280), 1992 (N = 1816) and 

1995 (N = 5044), providing a total of some 12,700 observations. While sample size varied 

between surveys, the sampling procedure remained the same. The sample is intended to 

represent the Jewish population in Israel aged 18–40 years. It was based on a geographic 

sample of points in each of the larger Jewish cities and towns (i.e. with population of 

30,000 or more) and a sample of the smaller localities. Starting at each of these geographic 

points a cluster of 10 residential apartments was sampled. Experienced interviewers first 

checked if there were individuals in the designated age bracket in the household. One 

individual was selected from each household by one of two, predetermined methods: either 

the first available eligible person was interviewed, or the interviewee was selected by the 

Troldahl–Carter method (for more details see Barnea et al. (1990)). 

Interviewers were instructed to make at least three attempts to obtain a sampled household, 

or person within the household. If there was no response from an apartment there was no 

way of determining whether it is residential, commercial, or an unoccupied apartment. Most 

probably there is some under-representation of new immigrants, and others who are either 

away from home much of their time, or do not speak Hebrew. Otherwise, the sample seems 

to be representative of Israeli Jews in this age bracket who live at home. It does not cover 

at all institutionalized persons, including those in military service away from home and 

prisoners, nor members of Kibbutzim (collective farms) who make up about 2% of the 

population. 

3 In the classic frailty model the survival hazards of parents and children are assumed to be related via ρ. This 

has a genetic interpretation; strong parents breed strong children. Additionally, or independently, there may be a 

behavioral effect in that the early death of the parent may adversely affect the survival of the child. This direct 

frailty effect is captured by β. We are unaware of empirical examples where direct and indirect frailty effects are 

estimated.


Table 1 

Sequencing in ever use of substances (%) 

From 

To 

Cigarettes Cannabis Hard drugs Stopped 

Nothing 1.1 0 

Cigarettes 12.2 2.9 24.6 

Cannabis 5.2 12.5 58.6 

Hard drugs 5.5 11.0 62.1 

Total 59.0 8.2 1.15 

Data on cigarette, alcohol and illicit drug consumption were collected through a series 

of questions using the following structure: 

1. Have you ever used X? 

2. If yes, how old were you when you used X for the first time? 

3. If yes, how much X have you used (a) ever, (b) during the last year, (c) during the last 

month? Fill in relevant category of use. 

The data were collected using face-to-face interviews. However, respondents provided 

answers to questions on illicit drug data anonymously at the end of the interview. 4 

Individuals who report positive ever-use of X but have not used X during the last year are 

deemed to have stopped using it (at least for the time being). Table 1 gives a bird’s eye view 

of the data and the sequencing of drug use. The bottom row of the table records the total who 

ever used the particular substance. For example, 59% of those who participated in the three 

samples reported that they ever smoked, while only 1.15% reported that they ever used hard 

drugs (cocaine, heroin, LSD). These totals do not take account of sequencing. The rows 

in Table 1 indicate sequencing. For example, 12.2% of cigarette smokers subsequently 

used cannabis, and 2.9% subsequently used hard drugs without using cannabis first. A 

total of 12.5% of cannabis users subsequently used hard drugs. Note that the data are not 

entirely consistent with gateway sequencing. A total of 1.1% used cannabis without smoking 

cigarettes first, i.e. about 5.5% of cannabis users did not smoke first. A total of 5.2% of 

cannabis users subsequently started smoking cigarettes. 

Such reverse gateway behavior is also apparent in the consumption of hard drugs. 2.9% 

of cigarette smokers subsequently used hard drugs without first using cannabis. 11% of 

hard drug users subsequently used cannabis and 5.5% of then subsequently started smoking 

cigarettes. 

It is clear that cigarette smoking is neither a necessary nor sufficient precondition for 

cannabis, which in turn is neither a necessary nor sufficient precondition for hard drugs. Most 

investigators of the gateway sequencing hypothesis would tend to accept such deviations as 

being within the margin of error. Our view is that the issue is not important. The important 

question is not whether it is possible to use cannabis prior to cigarettes (it evidently is), 

4 It is well known that despite attempts to guarantee confidentiality such data may contain reporting errors. 

Tsibel (2000) uses our data to estimate a true model and a miss-reporting model. She finds that miss-reporting 

depends upon age and the number of inconsistencies in replies.


but whether incidental use of cigarettes increases the risk of subsequently using cannabis. 

Likewise, the question will be whether the incidental use of cannabis increases the risk of 

subsequently using hard drugs. 

Table 1 further indicates that the majority of illicit drug users stop using them, e.g. 62.1% 

of hard drug users had stopped. The table shows that the proportion of cigarette smokers 

who stop smoking is smaller than the proportion of stoppers among illicit drug users. The 

data are truncated because users may stop after the survey date. We have shown elsewhere 

(Beenstock and Rahav, 2000) that the data in the table understate the true rates of stopping, 

and that whereas the stopping hazard for cannabis is duration dependent, the stopping hazard 

for hard drugs is age dependent. 

We do not attach importance to alcohol as a gateway substance, because in Israel the 

cultural significance of alcohol is quite different from its significance in other western 

countries. Apart from the usual demographic variables, such as age, education, country of 

origin, and father’s country of origin, the surveys also contain data about religious practice, 

frequency of pub visits, and a range of economic variables. 

3.2. Instrumental variables 

To identify the gateway effect of cigarettes on cannabis use we seek instruments, which 

affect the use of cigarettes without simultaneously affecting the use of cannabis. Tobacco 

taxes are uniform in Israel (except in Eilat where there is no VAT), hence we cannot implement 

Evans and Ringel’s (1999) idea. However, cigarette taxes have changed over time, 

as has the world price of tobacco. This suggests that the relative price of cigarettes may 

serve as a possible instrument for smoking. People who grew up when cigarettes were 

relatively cheap are more likely to smoke, given everything else, than people who grew 

up when cigarettes were expensive. In this case, randomization is by birth cohort and the 

price of cigarettes. The natural experiment consists of investigating whether birth cohorts 

who matured when cigarettes were relatively cheap were more prone to smoking, and as a 

consequence, were more prone to use cannabis. 

The crucial identifying restriction is that the price of cigarettes at the time when individual 

n was growing up affects the probability of him smoking them, but it does not affect the 

probability of using cannabis at a later date. There is no intertemporal substitution between 

cigarettes and cannabis; e.g. consumers do not plan to consume cannabis in the future as a 

result of an increase in the current price of cigarettes. In this sense, we assume consumers 

are myopic. If this were not the case the gateway effect would be inverted; when cigarette 

consumption falls cannabis consumption should increase. We show later that this does 

not happen in practice, and that cannabis consumption is not affected by cigarette prices, 

whereas cigarette consumption is so affected. This does not necessarily mean that there is 

no intratemporal substitution between cigarettes and cannabis. If an individual eventually 

happens to consume both cigarettes and cannabis a change in their relative price might 

induce him to alter his relative demand for cigarettes. 

Unfortunately, there are no systematic data for cannabis prices in Israel. Had they existed, 

however, we would have dated them at time t rather than time t − 1 because, according to 

gateway sequencing, cannabis initiation at time t−1 does not depend upon cannabis prices at 

a later date (time t) they would have constituted a further source of identifying information.


The approach may be extended to other variables with possibly identifying power. These 

include exposure to anti-smoking regulations such as the introduction of mandatory health 

warnings. Older birth cohorts, who were young before such regulations were introduced, are 

more likely to smoke, given other considerations, than younger birth cohorts. The accident 

of birth serves to randomize exposure to the regulatory environment. 

A third possible source of natural experimentation consists of the herd effect; the individual 

who happens to grow up when smoking is more prevalent may be more likely to 

smoke as a result of demonstration effects. Since the individual has no control over his 

environment, aggregate smoking patterns during the years of high risk exposure (say, aged 

15–20 years) may serve to randomize individual smoking behavior. 

In Figs. 1 and 2 we use CBS data to plot the real price of cigarettes (as measured by 

the average price per packet deflated by the consumer price index) in Israel, and aggregate 

cigarette consumption per capita. The data indicate that cigarettes were relatively expensive 

in the 1960s, while per capita cigarette consumption was lower. Subsequently, cigarettes 

became cheaper and per capita cigarette consumption increased. 5 A person born in the 

early 1950s would have experienced relatively high cigarette prices in his teens, and a more 

smoke-free environment. By contrast, a person born in the 1970s would have experienced 

relatively low cigarette prices in his teens, and a less smoke-free environment. Controlling 

for other factors, the former is less likely to smoke than the latter, if indeed herd effects and 

the price of cigarettes affects smoking initiation. 

We use the data in Fig. 1 to calculate a weighted average of cigarette prices for each 

individual during his “impressionable” years. The weights are taken from the hazard function 

for smoking initiation (Fig. 3, taken from Beenstock and Rahav (2001)). This gives 

a higher weight in the ages 15–20 years, when the risk of smoking initiation is high, and 

smaller weights when the risk of smoking initiation is low. We apply a similar procedure 

to the data in Fig. 2. In this way, we calculate for each individual the personalized 

price of cigarettes and smoking environment. These variables are naturally randomized 

by birth year, cigarette prices and aggregate smoking behavior. This is similar to the 

approach taken by Douglas and Hariharan (1994) who used a narrower window (15–18 

years) than ours. These variables are used as instrumental variables because they are independent 

of observed heterogeneity that might influence the propensity to smoke and use 

drugs. 

4. Results 

We report two types of gateway test using the methodology of natural experimentation. 

In the first, we estimate the recursive model as specified in Section 2.2 using a bivariate 

probit procedure (RBP), and a two-stage logit (2SL) procedure. The second type of test 

refers to the relationship between smoking initiation and cannabis initiation. We apply the 

5 Unfortunately, there are no systematic prevalence data. Since 1972, the Health Education Department at the 

Ministry of Health has occasionally commissioned surveys on smoking prevalence. These data have been analysed 

by Ben-Sira (e.g. Ben-Sira (1993)). They suggest that between 1972 and 1988 smoking prevalence was stable, but 

it fell during the 1990s.

688 M. Beenstock, G. Rahav / Journal of Health Economics 21 (2002) 679–698

M. Beenstock, G. Rahav / Journal of Health Economics 21 (2002) 679–698 689


Fig. 3. Survival and hazard functions for age at cigarette initiation. 

methodology described in Section 2.3 and estimate a CPHM for the former in which the 

personalized relative price of cigarettes is hypothesized to influence the hazard of smoking. 6 

4.1. Test 1: sequencing 

We begin by reporting results for the 2SL procedure, which we compare with results 

obtained using the RBP procedure. Table 2 records our chosen logit model for the use of 

cigarettes. All the variables are statistically significant. The “personalized” relative price of 

cigarettes reduces the probability of smoking (odds ratio = 0.991). Among the demographic 

controls we mention that women are less likely to smoke, as are more religious people. 

People born in Israel are less likely to smoke, as are people whose fathers were born in 

6 Douglas and Hariharan (1994) estimated a split sample hazard model for smoking. In Beenstock and 

Rahav (2001), we estimate split sample hazard models for cannabis and hard drugs both parametrically and 

semi-parametrically. In the present context, split sample estimation is not numerically feasible because of the need 

for bootstrapping. CPHM nevertheless does not require the population survival rate to tend to 0 and in this sense 

it mimics split sample models.


Table 2 

Logit model for cigarette smoking 

Variable Coefficient S.E. 

Intercept 1.2715 0.1254 

Female −0.6689 0.0390 

Age 0.0623 0.0072 

Israel −0.2634 0.0601 

Middle East −0.1677 0.0514 

Balkan −0.1926 0.0636 

Asia −0.4209 0.1747 

Eastern Europe −0.3033 0.0690 

Education 4 0.2823 0.0521 

Religious (high) −0.8841 0.0528 

Religious (middle) −0.2316 0.0453 

Survey 1989 0.4450 0.0614 

Cigarette Price −0.0094 0.0013 

N = 12,455; −2log L = 16,171; P-value for χ 2 = 0.0001. 

Asia. Cohort effects are picked up by two variables, age and survey 1989, both of which 

imply that earlier cohorts are more likely to smoke. 

We use this model to calculate the predicted probability of smoking (PPS, denoted in the 

last row of Table 3). Table 3 records our chosen model for subsequent cannabis use. The PPS 

has a large and highly significant positive effect on the probability of subsequent cannabis 

use. Had we used the dummy variable for prior smoking (C in Eq. (1) and the dependent 

variable in Table 2) instead of its instrumented value, it could have been argued that the 

observed gateway effect from smoking to cannabis is the result of spurious correlation 

induced by unobserved heterogeneity. In fact, it turns out that the dummy variable for 

smoking is yet more significant, however, this significance is exaggerated by unobserved 

heterogeneity. Since PPS is purged of unobserved heterogeneity, the conclusion from Table 3 

must be in favor of a causal gateway effect from cigarettes to cannabis. 

Two types of parameter standard errors are reported in Table 3, the original standard 

errors, and their bootstrapped counterparts. The standard errors have been bootstrapped for 

Table 3 

Logit model for subsequent cannabis use 

Variable Coefficient S.E. Bootstrap S.E. 

Intercept −6.0553 0.3838 

Female 0.2294 0.1067 0.1146 

Age −0.0207 0.0059 0.0061 

Israel 0.5566 0.0898 0.1055 

Education 4 −0.7391 0.1105 0.1144 

Education 5–7 −0.7072 0.0823 0.0756 

Religious 3 −0.4492 0.0818 0.0910 

Pub frequency 0.2701 0.0287 0.0271 

Cigarette probability (PPS) 6.4529 0.4347 0.5417 

N = 12,382; −2log L = 5782; P-value for χ 2 = 0.0001.


Table 4 

Comparison between 2SP and 2SBP 

Variable 2SP 2SBP 

Cigarette prices −0.00553 (7.119) −0.00538 (7.133) 

ρ 0 0.57 (27.1) 

PPS 3.228 (13.058) 3.402 (14.300) 

The t-values are shown in parentheses. 

two reasons. First, because PPS is an estimate of its true value. Secondly, because in any 

case IV estimation requires correction of the standard errors in the second stage. We use 200 

i.i.d. resamplings to calculate the standard errors. Andrews and Buchinsky (1999) discuss 

the number of resamplings required to attain a prescribed degree of accuracy for different 

P-values. It turns out that the original (unbootstrapped) standard errors provide a reliable 

guide to statistical significance. Also, the original parameter estimates reported in Table 3 

turn out to be close to the means of their bootstrapped counterparts, suggesting that the 

crude estimates are unbiased. 

It is noteworthy that if the relative price of cigarettes is omitted from Table 2, PPS ceases 

to be statistically significant in Table 3. This suggests that there is identifying information 

in cigarette prices, and that the significance of PPS in Table 3 is not simply the result of 

parametric identification. 

We experimented with other instruments, including herd effects and exposure to cigarette 

regulations. In the former case we used “personalized” cigarette consumption per capita 

(based in Fig. 2) as an additional covariate in Table 2, but it was not statistically significant. 7 

In the latter case, we defined exposure to be the number of adult years lived prior to the 

introduction of mandatory health warnings on cigarette packets. We expected that more 

exposed individuals are more likely to smoke, holding other variables constant. However, 

this did not turn out to be the case. This was true when we used a variety of health warning 

trigger dates (1983 Israel, 1971 UK, 1965 US) both individually and collectively. If anything, 

we found that more exposed individuals, as defined, were less likely to smoke. However, this 

may be the manifestation of some complex cohort effect; early birth cohorts are less likely 

to smoke. In summary, the main instrument for smoking turned out to be the personalized 

relative price of cigarettes. 

Other results in Table 3 include a negative cohort effect; older birth cohorts are less likely 

to use cannabis, Sabras (native Israelis) are more likely to use cannabis, the uneducated and 

the highly educated are more likely to use cannabis. When controlling for other factors, 

women and secular Jews are more likely to use cannabis. 

As noted in Section 2.2, 2SL is consistent but not efficient, because it restricts ρ = 0. 

Inefficiency may induce bias in non-linear estimators. To investigate this we re-estimated the 

model using two-stage probit (2SP), solved for PPS, and then iterated PPS using two-stage 

bivariate probit (2SBP). Coefficient estimates for the main variables of interest are shown 

in Table 4. 

7 It is possible that prevalence data would have produced different results. See Footnote 5.


Table 5 

Logit model for subsequent hard drug use 


Intercept −5.0 2.3445 

Female −1.0621 0.2611 0.2880 

Age 0.0089 0.0151 0.0140 

Israel −0.5690 0.2743 0.3345 

Middle East −1.1456 0.3123 0.3187 

Balkan −1.1456 0.3544 0.4364 

Eastern Europe −1.2418 0.3759 0.4513 

Education 4 −0.6849 0.2919 0.2909 

Education 5–7 −1.1975 0.2969 0.3440 

Religious (high) −1.1149 0.4127 0.4388 

Religious (middle) −0.7614 0.2968 0.3644 

Survey 1989 −0.5022 0.2382 0.2673 

Survey 1992 −0.6256 0.3070 0.2989 

Pub frequency 0.3001 0.1106 0.1514 

PPH 1.0967 2.6563 2.7315 

N = 12,382; −2log L = 1261; P-value for χ 2 = 0.0001. 

Table 4 shows that the enhanced efficiency of 2SBP makes little difference to the parameter 

estimates. We also found that 2SP estimates are similar to the 2SL estimates reported 

in Tables 2 and 3. This suggests that while 2SL is not efficient, and may be biased in small 

samples, the loss of efficiency is likely to be minimal. 

We used the specification in Tables 2 and 3 to estimate by RBP instead of 2SL. To save 

space we do not report the counterparts of Tables 2 and 3 that were obtained using RBP. On 

the whole the results were similar; the signs and significance of the coefficients were the 

same as in 2SL. The RBP estimate of β in Eq. (1) is 2.337 with t-value 36.57, which may 

be contrasted with its two-stage counterparts reported in the Table 4. 

In Table 5, we apply the method of “domino” identification to the effect of cannabis on 

hard drugs. The parameter of interest is δ in Eq. (3). We present a 2SL model for subsequent 

hard drug use (i.e. after cannabis) in which PPH is the predicted probability of cannabis 

use that is derived from Table 3. It is positive but not statistically significant, implying 

that there is not a causal gateway effect from cannabis to hard drug use. However, it turns 

out that this result depends upon the specification of the model in Table 3. If, e.g. fathers’ 

country of birth is included as a regressor, then PPH becomes marginally significant. In 

summary: becoming a cigarette smoker increases the probability of becoming a user of 

cannabis, which increases the probability of becoming a user of hard drugs. The former link 

in the gateway chain is highly statistically significant, whereas the latter link is less statistically 

robust. Since the asymptotic standard errors of the parameter estimates vary directly 

with the rarity of the event, the standard errors in the model for hard drugs will naturally 

tend to be larger than their counterparts in the model for cannabis. It is possible therefore 

that 12,500 observations may be insufficient to estimate with sufficient accuracy the parameters 

in Table 5. Also, “domino” identification naturally weakens the power of the test 

because of its indirectness. It places perhaps too great a burden of identification on cigarettes 

prices.


So much for the two-stage investigation of the causal effect of cannabis use on hard 

drug use. We now use RBP to investigate the same issue. Strictly speaking, RBP is not 

appropriate because we have added a third link to the gateway chain. A recursive trivariate 

probit (RTP) estimator is required. Since RTP is non-standard we have carried out the 

following approximation. 

(a) We use RBP to model the link between cigarettes and cannabis. 

(b) We use the results to obtain PPS, the predicted probability for smoking. 

(c) We use RBP to model the link between cannabis and hard drugs, where PPS is used in 

the probit model for cannabis. 

This procedure estimates RTP in two stages. When we use the variables specified in 

Tables 2, 3 and 5 we obtain that the estimate of δ is 0.178 (t = 0.708), implying that the 

causal effect from cannabis to hard drugs is positive, but not statistically significant. The 

associated estimate of ρ is 0.835 (S.E. = 0.135). This result confirms the one obtained by 

2SL as reported in Table 5. The estimate of ρ strongly suggests that the gateway effect from 

cannabis to hard drugs is the result of correlation between the unobserved heterogeneity; it 

is not causal. 

However, δ became statistically significant when certain variables were dropped from the 

model for hard drugs. When we dropped “pub frequency” and “education 4”, we obtained 

an estimate of δ = 2.61 (t = 5.53) and ρ =−0.22 (S.E. = 0.213). While these restrictions 

fail a likelihood ratio test, they undermine the degree to which we can confidently reject the 

hypothesis that there is no causal gateway effect from cannabis to hard drugs. 

4.2. Test 2: initiation hazard 

In Table 6 we present a CPHM for smoking initiation in which the “personalized” price of 

cigarettes is clearly significant, implying that the age of cigarette initiation varies directly 

with the price of cigarettes, after controlling for demographic variables. The model also 

implies that women start smoking later, as do religious people and Israeli born. 

Table 6 

CPHM for smoking initiation 

Variable Coefficient S.E. 

Female −0.496400 0.02364 

Israel 0.135970 0.03793 

Middle East 0.111880 0.03215 

Balkan −0.075090 0.03909 

Asia −0.268128 0.11848 

Eastern Europe −0.152799 0.04350 

Education 4 0.216162 0.03074 

Religious (high) −0.547501 0.03484 

Religious (middle) −0.148835 0.02748 

Survey 1989 0.106948 0.02992 

Cigarette price −0.001895 0.000313 

N = 12,455; −2log L = 13,1067; P-value for χ 2 = 0.0001.


Table 7 

CPHM for cannabis initiation 


Female 0.4211 0.1192 0.1351 

Israel 0.4527 0.0839 0.0946 

Balkan −0.2003 0.0947 0.0980 

Education 4 −0.7770 0.1083 0.1094 

Education 5–7 −0.6260 0.0752 0.0681 


Pub frequency 0.2682 0.0255 0.0247 

PRSS 2.2195 0.1953 0.2137 

N = 12,382; −2log L = 15,874; P-value for χ 2 = 0.0001. 

In Table 7 we present a CPHM for cannabis initiation in which PRSS = Xβ is specified 

as a covariate. Similar results are obtained if Xβ is transformed non-linearly as in the earlier 

equation. We carried out 200 i.i.d. resamplings for bootstrapping. PRSS is statistically significant, 

implying that cigarette initiation has a causal gateway effect on cannabis initiation. 

The longer one is predicted to survive without smoking, the longer one survives without 

using cannabis. Table 8 further implies that women initiate cannabis later, as do religious 

people and the children of Balkan fathers. Israeli born initiate earlier despite the fact that 

they start smoking later. 

Next, we calculate the predicted relative survival for cannabis (PRSC) from Table 7 for 

each individual, and we apply the method of “domino” identification to determine whether 

age at cannabis initiation has a causal effect upon age at hard drug initiation. We specify 

PRSC in Table 8 as the indirectly instrumented value of the cannabis initiation hazard. 

The non-instrumented alternative would have been to specify the age of cannabis initiation. 

However, this would have embodied unobserved heterogeneity, which might have induced a 

Table 8 

CPHM for hard drug initiation 


Female −0.7609 0.4573 0.4043 

Israel −0.6227 0.2894 0.2671 

Middle East −1.0511 0.3378 0.3362 

Balkan −0.9891 0.4203 0.4572 

Eastern Europe −1.0666 0.4070 0.4423 

Education 4 −0.4802 0.3201 0.3307 

Education 5–7 −0.8362 0.4115 0.3956 

Religious (high) −0.6042 0.8012 0.7854 


Survey 1989 −0.5681 0.2314 0.2041 

Survey 1992 −0.5729 0.3025 0.3431 

Pub frequency 0.1999 0.1961 0.1984 

PRSC 0.3173 0.6052 0.5558 

N = 12,382; −2log L = 2128; P-value for χ 2 = 0.0001.


spurious gateway effect. The bootstrapped standard errors are based upon 200 resamplings. 

Table 8 indicates that PRSC is positive but not statistically significant, implying that there 

is not a causal gateway effect from cannabis to hard drugs. 

When the age of cannabis initiation is specified instead of PRSC in Table 8, its coefficient 

turned out to be negative and highly statistically significant, implying that age at hard drug 

initiation varies directly with age at cannabis initiation. This result suggests a gateway 

effect from cannabis initiation to hard drug initiation. However, the test in Table 8 shows 

that this effect is not causal since it results from unobserved heterogeneity. People who are 

susceptible to drugs happen to initiate cannabis sooner, and initiate hard drugs sooner. Thus, 

the gateway effect is spurious. 

In summary, beginning to smoke at an earlier age induces earlier use of cannabis, which 

induces earlier use of hard drugs. The former link in the gateway chain is highly statistically 

significant, whereas the latter link is not robust. In these respects, the results from experiment 

2 complement those obtained from experiment 1. 

5. Conclusion 

Controlling for cohort and other effects, people who grew up when cigarettes were relatively 

cheap were more likely to smoke, and to start smoking younger. As a causal consequence 

they were more likely to use cannabis afterwards and to initiate cannabis sooner. Our 

findings from natural experimentation indicate for the first time a causal gateway effect from 

cigarettes to cannabis, both for sequencing and for timing. On the other hand, we do not find 

a causal gateway effect from cannabis to hard drugs, both for sequencing and for timing. 

The latter result has to be qualified by two considerations. Because hard drug use is rare, 

hypothesis testing is more difficult. Secondly, the identification of the causal gateway effect 

from cannabis to hard drugs was indirect. 

These findings imply that higher cigarette taxes will not only reduce cigarette smoking, 

they will also reduce cannabis consumption. However, they will not reduce the consumption 

of hard drugs. They further imply that if cannabis is decriminalized, and the consumption of 

cannabis increases (Model, 1991), it will not effect the consumption of hard drugs. Hence, 

if policy makers are reluctant to decriminalize cannabis for gateway reasons our results do 

not support this position. It should be recalled, however, that just as the anti-drug legislation 

in the early part of the 20th century was politically inspired (Musto, 1973), so scientific 

reasoning in the 21st century may not sway the issue. 

We admit that our approach has been somewhat ad hoc. This stems from the fact that the 

rational addiction model refers to single substances rather than multiple substances. There 

is a need to extend the rational addiction model to multiple substances in order to shed 

theoretical light upon the gateway phenomenon. Why do some people advance to cannabis 

from cigarettes, or from cannabis to hard drugs, while others do not? Such a model might 

suggest how the prices of cigarettes, cannabis and hard drugs affect the consumption of these 

substances at different stages in the life cycle. In the absence of price data for illicit drugs 

in Israel, we have been forced to place all the burden of identification upon cigarette prices. 

The availability of illicit drug price data in the US makes the extension of our approach 

there of particular interest.

Acknowledgements 


We wish to thank Irena Vovk for her excellent research assistance; the Israel Science 

Foundation and the Eshkol Institute for financial support; the Israel Anti-Drug Authority 

for providing the data; Michael Hvoshnyansky who helped us prepare the final version of 

the paper; and three referees whose remarks considerably improved the paper. 

References 

Andrews, D.W.K., Buchinsky, M., 1999. A three-step method for choosing the number of bootstrap repetitions. 

Econometrica 68, 23–51. 

Barnea, Z., Teichman, M., Rahav, G., Gil, R., Rozenbloom, Y., 1990., The Use of Drugs and Alcohol Among the 

Population of Israel: An Epidemiologic Study. The Anti-Drugs Authority, Jerusalem. 

Baumrind, D., 1983. Specious causal attributions in the social sciences: the reformulated stepping stone theory of 

heroin use as an exemplar. Journal of Personality and Social Psychology 45, 1289–1298. 

Becker, G.S., Murphy, K.M., 1988. A theory of rational addiction. Journal of Political Economy 75, 700–765. 

Beenstock, M., Rahav, G., 2000. Maturing Out as a Life-Cycle or Duration Dependent Phenomenon in the Natural 

History of Drug Use in Israel, mimeo. 

Beenstock, M., Rahav, G., 2001. Immunity and Susceptibility in Illicit Drug Initiation in Israel. 

Ben-Sira, Z., 1993. Smoking: A Follow-up on Behavior Patterns and Change. The Louis Guttman Israel Institute 

of Applied Social Research, Jerusalem (in Hebrew). 

Chaloupka, F.J., 1991. Rational addictive behavior and cigarette smoking. Journal of Political Economy 99, 722– 

742. 

Douglas, S., Hariharan, G., 1994. The hazard of starting smoking: estimates from a split population duration 

model. Journal of Health Economics 13, 213–230. 

Evans, W.N., Ringel, J.S., 1999. Can higher cigarette taxes improve birth outcomes? Journal of Public Economics 

18, 135–154. 

Frey, B.S., 1997. Drugs and Economic Policy, Economic Policy. 

Greene, W.H., 2000., Econometric Analysis, 4th Edition. Prentice-Hall, Englewood Cliffs, NJ. 

Grossman, M., Chaloupka, F.J., 1999. The demand for cocaine by young adults: a rational addiction approach. 

Journal of Health Economics 18, 427–474. 

Hougaard, P., 1995. Frailty models for survival data. Lifetime Data Analysis 1, 255–274. 

Kandel, D.S., 1975. Stages in adolescent involvement in drug use. Science 190, 912–924. 

Klein, J.P., Moeschberger, M.L., 1997. Survival Analysis. Springer, New York. 

Larson, M.G., Dinse, G.E., 1985. A mixture model for the regression analysis of competing risks data. Applied 

Statistics 34, 201–211. 

Maddala, G.S., 1983. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge University Press, 

Cambridge. 

Miron, J.A., Zweibel, J., 1991. Alcohol consumption during prohibition. American Economic Review 81, 242–247. 

papers and proceedings.. 

Miron, J.A., Zweibel, J., 1995. The economic case against drug prohibition. Journal of Economic Perspectives 9, 

175–192. 

Model, K., 1991. The effect of marijuana decriminalization on hospital emergency room drug episodes. Journal 

of the American Statistical Association 88, 737–747. 

Moore, M.H., 1973. Policies to achieve discrimination on the effective price of heroin. American Economic Review 

63, 270–277. 

Mullahy, J., Sindelar, J.L., 1993. Alcoholism, work and income. Journal of Labor Economics 11, 494–520. 

Musto, D.F., 1973. The American Disease: Origins of Narcotic Control. Yale University Press, New Haven. 

Nisbet, C.T., Vakil, F., 1972. Some estimates of price and expenditure elasticities of demand for marijuana among 

UCLA students. Review of Economics and Statistics 54, 473–475. 

O’Donnell, J.A., Clayton, R.R., 1982. The stepping stone hypothesis: marijuana, heroin and causality. Chemical 

Dependency 4, 229–241.


Pacula, R.L., 1998. Adolescent Alcohol and Marijuana Consumption: Is there Really a Gateway Effect? NBER 

working paper 6348. 

Prinz, A., 1997. Do European drugs policies matter? Economic Policy. 

Rottenberg, S., 1968. The clandestine distribution of heroin, its discovery and suppression. Journal of Political 

Economy 75, 78–90. 

Tsibel, N., 2000. Event history analysis accounting for measurement errors and missing data, Ph.D. dissertation. 

The Hebrew University of Jerusalem. 

White, M.D., Lusketich, W.A., 1983. Heroin: price elasticity and enforcement strategies. Economic Inquiry 21, 

557–564.

Testing Gateway Theory: do cigarette prices affect illicit drug use?

Create successful ePaper yourself

Delete template?

Save as template?