Propensity-score matching (PSM) is a widely used statistical technique to estimate causal treatment effects, especially in observational studies where random assignment is not feasible. The idea is to create a statistical equivalent of randomization by matching treated and untreated subjects based on their propensity scores. A propensity score is the probability of receiving the treatment given a set of observed covariates. By matching individuals with similar propensity scores, researchers aim to reduce selection bias and obtain a more accurate estimate of the treatment effect.
Implementing Propensity-Score Matching in Stata®
Stata® provides a convenient way to perform Propensity-Score Matching using the teffects
command, specifically for treatment effect estimation. Here’s a general guide on how to do this.
Step 1: Estimate the Propensity Scores
To start, you need to estimate the propensity scores. This is typically done using a logit or probit model where the dependent variable indicates whether the subject received the treatment or not.
For example, let’s assume your dataset contains a treatment variable treat
(1 = treated, 0 = untreated) and a set of covariates X1
, X2
, …, Xn
that you want to control for:
This estimates the probability (propensity score) of receiving the treatment based on the covariates.
Step 2: Perform Propensity-Score Matching
Once the propensity scores are estimated, you can use them to match treated and untreated individuals. In Stata, this is done using the teffects
command, which allows you to specify the type of matching and the outcome variable.
Here’s an example of performing nearest-neighbor matching:
teffects psmatch (outcome) (treat X1 X2 X3), nneighbor(1)
outcome
: This is the variable representing the outcome of interest.treat
: This indicates the treatment variable.X1 X2 X3
: These are the covariates used to estimate the propensity score.nneighbor(1)
: This option specifies that each treated individual will be matched with one untreated individual with the closest propensity score.
Step 3: Analyze the Treatment Effect
After matching, Stata will return the estimated average treatment effect on the treated (ATT), average treatment effect (ATE), and other relevant statistics depending on your model specification.
Other Matching Methods
You can customize the matching technique using different options in teffects
. Some commonly used methods include:
- Radius Matching: Matching within a specified caliper.
stata
teffects psmatch (outcome) (treat X1 X2 X3), caliper(0.05)
- Kernel Matching: This method uses a weighted average of all untreated individuals within a certain bandwidth.
stata
teffects psmatch (outcome) (treat X1 X2 X3), kernel
- Stratification or Interval Matching: Individuals are divided into strata based on their propensity scores, and the treatment effect is estimated within each stratum.
stata
teffects ipw (outcome) (treat X1 X2 X3)
Step 4: Check Balance
Before concluding that your matching procedure has worked, it’s important to verify whether the matching has balanced the covariates between the treated and control groups. In Stata, you can check balance using the pstest
command after the matching:
pstest X1 X2 X3, graph
This will provide a graphical and numerical check of whether the covariates are balanced between the two groups.
Example Workflow:
Let’s walk through a simplified example where we have a binary treatment treat
, an outcome variable outcome
, and covariates age
, income
, and education
:
- Estimate Propensity Scores:
stata
logit treat age income education
predict pscore
- Perform Nearest-Neighbor Matching:
stata
teffects psmatch (outcome) (treat age income education), nneighbor(1)
- Check Balance:
stata
pstest age income education, graph
Interpreting Results:
After running the PSM procedure, Stata will provide an estimate of the treatment effect, such as:
- ATT (Average Treatment Effect on the Treated): The effect of the treatment for those who received the treatment.
- ATE (Average Treatment Effect): The effect of the treatment across the entire sample.
You can interpret these values based on the magnitude and statistical significance of the results. A positive and significant ATT would indicate that the treatment had a beneficial effect for those who received it, after accounting for the covariates.