## Today I will teach you Common Stata Commands

- General Plotting Commands
- Plot a histogram of a variable:

graph vn, bin(xx) - Plot a histogram of a variable using frequencies:

graph vn, bin(xx) freq - Plot a histogram of a variable with a normal approximation:

graph vn, bin(xx) norm

where xx is the number of bins. - Plot a boxplot of a variable:

graph vn, box - Plot side-by-side box plots for one variable (vone) by categories of another variable vtwo.

(vtwo should be categorical)):

sort vtwo

graph vone, box by(vtwo) - A scatter plot of two variables:

graph vone vtwo - A matrix of scatter plots for three variables:

graph vone vtwo vthr, matrix - A scatter plot of two variables with the values of a third variable used in place of points on the graph (vthr might contain numerical values or indicate categories, such as male (m) and female (f)):

graph vone vtwo, symbol([vthr]) - Normal quantile plot:

qnorm vn

- Plot a histogram of a variable:
- General commands
- To compute means and standard deviations of all variables:

summarize

or, using an abbreviation,

summ - To compute means and standard deviations of select variables:

summarize vone vtwo vthr

or, using an abbreviation,

summ vone vtwo vthr - To get more numerical summaries for one variable:

summ vone, detail - Correlation between two variables:

correlate vone vtwo - To see all values (all variables and all observations, not recommended for large data sets):

list - To list values for two variables:

list vone vtwo - To list the first 10 values for two variables:

list vone vtwo in 1/10 - To list the last 10 values for two variables:

list vone vtwo in -10/l

(The end of this command is minus 10/letter l.)

- To compute means and standard deviations of all variables:
- Tables
- Tabulate variable vn:

tabulate vn

or, using an abbreviation,

tab vn - Cross tabulate two variables:

tab vone vtwo - Cross tabulate two variables, include one or more of the options to produce column, row or cell percents and to suppress printing of frequencies:

tab vone vtwo, column row cell

- Tabulate variable vn:
- Generating new variables
- General.
- Generate index of cases 1,2,
*ldots*,*n*) (this may be useful if you sort the data, then want to restore the data to the original form without reloading the data):`generate case= _n`

or, using an abbreviation,`gen case=_n`

- Multiply values in vx by
*b*and add*a*, store results in vy:

gen vy = a+ b * vx - Generate a variable with all values 0:

gen vone=0 - Generate a variable with values 0 unless vtwo is greater than
*c*, then make the value 1:

gen vone=0

replace vone=1 if vtwo>c

- Generate index of cases 1,2,
- Random numbers.
- Set numbers of observations to
*n*:

set obs*n* - Set random number seed to XXXX, default is 1000:

set seed XXXX - Generate
*n*uniform random variables (equal chance of all outcomes between 0 and 1:

gen vn=uniform() - Generate
*n*uniform random variables (equal chance of all outcomes between*a*and*b*:

gen vn=a+(b-a)*uniform() - Generate
*n*discrete uniform random variables (equal chance of all outcomes between 1 and 6

(These commands simulate rolling a six-sided die):

gen vn=1+int(6**uniform()) - Normal data with mean 0 and standard deviation 1:

gen vn= invnorm(uniform()) - Normal data with mean
*mu*and standard deviation*sigma*:

gen vn=*mu*+*sigma** invnorm(uniform())

- Set numbers of observations to

- General.
- Regression
- Compute simple regression line (vy is response, vx is predictor):

regress vy vx - Compute predictions, create new variable yhat:

predict yhat - Produce scatter plot with regression line added:

graph vy yhat vx, connect(.s) symbol(oi) - Compute residuals, create new variable tt residuals:

predict residuals, resid - Produce a residual plot with horizontal line at 0:

graph residuals, yline(0) - Identify points with largest and smallest residuals:

sort residuals

list in 1/5

list in -5/l

(The last command is minus 5/letter l.) - Compute multiple regression equation (vy is response,

vone, vtwo, and vthr are predictors):

regress vy vone vtwo vthr

- Compute simple regression line (vy is response, vx is predictor):

### Important Notes on “stem” command

There is a glitch with Stata’s “stem” command for stem-and-leaf plots. The “stem” function seems to permanently reorder the data so that they are sorted according to the variable that the stem-and-leaf plot was plotted for. The best way to avoid this problem is to avoid doing any stem-and-leaf plots (do histograms instead). However, if you really want to do a stem-and-leaf plot you should always create a variable containing the original observation numbers (called “index”, for example). A command to do so is: generate index = _n

If you do this, then you can re-sort the data after the stem-and-leaf plot according to the index variable (Stata command: sort index ) so that the data is back in the original order.**Commands**: Here are some other commands that you may find useful (this is by no means an exhaustive list of all Stata commands):

anova | general ANOVA, ANCOVA, or regression |

by | repeat operation for categories of a variable |

ci | confidence intervals for means |

clear | clears previous dataset out of memory |

correlate | correlation between variables |

describe | briefly describes the data (# of obs, variable names, etc.) |

diagplot | distribution diagnostic plots |

drop | eliminate variables from memory |

edit | better alternative to input for Macs |

exit | leave Stata |

generate | creates new variables (e.g. generate years = close – start) |

graph | general graphing command (this command has many options) |

help | online help |

if | lets you select a subset of observations (e.g. list if radius >= 3000) |

infile | read non-Stata-format dataset (ASCII or text file) |

input | type in raw data |

list | lists the whole dataset in memory (you can also list only certain variables) |

log | save or print Stata ouput (except graphs) |

lookup | keyword search of commands, often precursor to help |

oneway | oneway analysis of variance |

pcorr | partial correlation coefficients |

plot | text-mode (crude) scatterplots |

predict | calculated predicted values (y-hat), residuals (ordinary, standardized and studentized), leverages, Cook’s distance, standard error of predicted individual y, standard error of predicted mean y, standard error of residual from regression |

regress | regression |

replace | lets you change individual values of a variable |

save | saves data and labels in a Stata-format dataset |

sebarr | standard error-bar chart |

sort | sorts observations from smallest to largest |

stem | stem and leaf display |

summarize | produces summary statistics (# obs, mean, sd, min, max) (has a detail option) |

test | conducts various hypothesis tests (refers back to most recent model fit (e.g. regress or anova ) (see help function for info and examples)) |

ttest | one and two-sample t-tests |

use | retrieve previously saved Stata dataset |