Today I will teach you Common Stata Commands
- General Plotting Commands
- Plot a histogram of a variable:
graph vn, bin(xx) - Plot a histogram of a variable using frequencies:
graph vn, bin(xx) freq - Plot a histogram of a variable with a normal approximation:
graph vn, bin(xx) norm
where xx is the number of bins. - Plot a boxplot of a variable:
graph vn, box - Plot side-by-side box plots for one variable (vone) by categories of another variable vtwo.
(vtwo should be categorical)):
sort vtwo
graph vone, box by(vtwo) - A scatter plot of two variables:
graph vone vtwo - A matrix of scatter plots for three variables:
graph vone vtwo vthr, matrix - A scatter plot of two variables with the values of a third variable used in place of points on the graph (vthr might contain numerical values or indicate categories, such as male (m) and female (f)):
graph vone vtwo, symbol([vthr]) - Normal quantile plot:
qnorm vn
- Plot a histogram of a variable:
- General commands
- To compute means and standard deviations of all variables:
summarize
or, using an abbreviation,
summ - To compute means and standard deviations of select variables:
summarize vone vtwo vthr
or, using an abbreviation,
summ vone vtwo vthr - To get more numerical summaries for one variable:
summ vone, detail - Correlation between two variables:
correlate vone vtwo - To see all values (all variables and all observations, not recommended for large data sets):
list - To list values for two variables:
list vone vtwo - To list the first 10 values for two variables:
list vone vtwo in 1/10 - To list the last 10 values for two variables:
list vone vtwo in -10/l
(The end of this command is minus 10/letter l.)
- To compute means and standard deviations of all variables:
- Tables
- Tabulate variable vn:
tabulate vn
or, using an abbreviation,
tab vn - Cross tabulate two variables:
tab vone vtwo - Cross tabulate two variables, include one or more of the options to produce column, row or cell percents and to suppress printing of frequencies:
tab vone vtwo, column row cell
- Tabulate variable vn:
- Generating new variables
- General.
- Generate index of cases 1,2,ldots,n) (this may be useful if you sort the data, then want to restore the data to the original form without reloading the data):
generate case= _n
or, using an abbreviation,gen case=_n
- Multiply values in vx by b and add a, store results in vy:
gen vy = a+ b * vx - Generate a variable with all values 0:
gen vone=0 - Generate a variable with values 0 unless vtwo is greater than c, then make the value 1:
gen vone=0
replace vone=1 if vtwo>c
- Generate index of cases 1,2,ldots,n) (this may be useful if you sort the data, then want to restore the data to the original form without reloading the data):
- Random numbers.
- Set numbers of observations to n:
set obs n - Set random number seed to XXXX, default is 1000:
set seed XXXX - Generate n uniform random variables (equal chance of all outcomes between 0 and 1:
gen vn=uniform() - Generate n uniform random variables (equal chance of all outcomes between a and b:
gen vn=a+(b-a)*uniform() - Generate n discrete uniform random variables (equal chance of all outcomes between 1 and 6
(These commands simulate rolling a six-sided die):
gen vn=1+int(6**uniform()) - Normal data with mean 0 and standard deviation 1:
gen vn= invnorm(uniform()) - Normal data with mean mu and standard deviation sigma:
gen vn= mu + sigma * invnorm(uniform())
- Set numbers of observations to n:
- General.
- Regression
- Compute simple regression line (vy is response, vx is predictor):
regress vy vx - Compute predictions, create new variable yhat:
predict yhat - Produce scatter plot with regression line added:
graph vy yhat vx, connect(.s) symbol(oi) - Compute residuals, create new variable tt residuals:
predict residuals, resid - Produce a residual plot with horizontal line at 0:
graph residuals, yline(0) - Identify points with largest and smallest residuals:
sort residuals
list in 1/5
list in -5/l
(The last command is minus 5/letter l.) - Compute multiple regression equation (vy is response,
vone, vtwo, and vthr are predictors):
regress vy vone vtwo vthr
- Compute simple regression line (vy is response, vx is predictor):
Important Notes on “stem” command
There is a glitch with Stata’s “stem” command for stem-and-leaf plots. The “stem” function seems to permanently reorder the data so that they are sorted according to the variable that the stem-and-leaf plot was plotted for. The best way to avoid this problem is to avoid doing any stem-and-leaf plots (do histograms instead). However, if you really want to do a stem-and-leaf plot you should always create a variable containing the original observation numbers (called “index”, for example). A command to do so is: generate index = _n
If you do this, then you can re-sort the data after the stem-and-leaf plot according to the index variable (Stata command: sort index ) so that the data is back in the original order.
Commands: Here are some other commands that you may find useful (this is by no means an exhaustive list of all Stata commands):
anova | general ANOVA, ANCOVA, or regression |
by | repeat operation for categories of a variable |
ci | confidence intervals for means |
clear | clears previous dataset out of memory |
correlate | correlation between variables |
describe | briefly describes the data (# of obs, variable names, etc.) |
diagplot | distribution diagnostic plots |
drop | eliminate variables from memory |
edit | better alternative to input for Macs |
exit | leave Stata |
generate | creates new variables (e.g. generate years = close – start) |
graph | general graphing command (this command has many options) |
help | online help |
if | lets you select a subset of observations (e.g. list if radius >= 3000) |
infile | read non-Stata-format dataset (ASCII or text file) |
input | type in raw data |
list | lists the whole dataset in memory (you can also list only certain variables) |
log | save or print Stata ouput (except graphs) |
lookup | keyword search of commands, often precursor to help |
oneway | oneway analysis of variance |
pcorr | partial correlation coefficients |
plot | text-mode (crude) scatterplots |
predict | calculated predicted values (y-hat), residuals (ordinary, standardized and studentized), leverages, Cook’s distance, standard error of predicted individual y, standard error of predicted mean y, standard error of residual from regression |
regress | regression |
replace | lets you change individual values of a variable |
save | saves data and labels in a Stata-format dataset |
sebarr | standard error-bar chart |
sort | sorts observations from smallest to largest |
stem | stem and leaf display |
summarize | produces summary statistics (# obs, mean, sd, min, max) (has a detail option) |
test | conducts various hypothesis tests (refers back to most recent model fit (e.g. regress or anova ) (see help function for info and examples)) |
ttest | one and two-sample t-tests |
use | retrieve previously saved Stata dataset |