Common R Commands
Below is a list of common commands that we use for the undergraduate class, along with some examples. You can view a help file in RStudio for each command by searching for the command name in the help tab in the lower right panel. You can also just type the name of the command preceded by a “?” into the console. For example, if you wanted to understand how barplot works, type:
The list here does not contain information about making plots in R. That information is in the Plotting Cookbook appendix.
Univariate Statistics
mean
Calculate the mean of a quantitative variable. Remember that this command will not work for categorical variables.
## [1] 24.27601
median
Calculate the median of a quantitative variable. Remember that this command will not work for categorical variables.
## [1] 19.21667
sd
Calculate the standard deviation of a quantitative variable. Remember that this command will not work for categorical variables.
## [1] 16.23676
IQR
Calculate the interquartile range of a quantitative variable. Remember that this command will not work for categorical variables.
## [1] 17
quantile
Calculate percentiles of a distribution. Remember that this command will not work for categorical variables. By default, the quantile command will return the quartiles (0,25,50,75,100 percentiles). If you want different percentiles, you will have to specify the probs
argument.
## 0% 25% 50% 75% 100%
## 1.00000 13.00000 19.21667 30.00000 99.99000
## 10% 90%
## 10.000 47.596
table
Calculate the absolute frequencies of the categories a categorical variable.
##
## Action Animation Comedy Drama Family
## 207 139 781 332 155
## Horror Musical/Music Mystery Romance SciFi/Fantasy
## 220 103 39 138 258
## Thriller
## 181
prop.table
Calculate the proportions (i.e. relative frequencies) of the categories of a categorical variable. This command must be run on the output from a table
command. You can do that in one command by nesting the table
command inside the prop.table
command.
##
## Action Animation Comedy Drama Family
## 0.08108108 0.05444575 0.30591461 0.13004309 0.06071289
## Horror Musical/Music Mystery Romance SciFi/Fantasy
## 0.08617313 0.04034469 0.01527615 0.05405405 0.10105758
## Thriller
## 0.07089698
summary
Provide a summary of a variable, either categorical or quantitative.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 13.00 19.22 24.28 30.00 99.99
## Action Animation Comedy Drama Family
## 207 139 781 332 155
## Horror Musical/Music Mystery Romance SciFi/Fantasy
## 220 103 39 138 258
## Thriller
## 181
Bivariate Statistics
table
Can be used to create a two-way table, although further work needs to be done to extract useful information from the two-way table.
##
## G PG PG-13 R
## Action 0 3 98 106
## Animation 37 92 6 4
## Comedy 1 71 352 357
## Drama 2 36 116 178
## Family 16 131 8 0
## Horror 0 1 60 159
## Musical/Music 0 7 57 39
## Mystery 0 0 7 32
## Romance 0 12 64 62
## SciFi/Fantasy 0 10 186 62
## Thriller 0 0 37 144
prop.table
Calculate the conditional distributions from a two-way table. The first argument here must be a two-way table output from the table
command. It is very important that you also add a second argument that indicated the way you want the conditional distributions. 1 will give you distributions conditional on the row variable and 2 will give you distributions conditional on the column variable.
##
## G PG PG-13 R
## Action 0.000000000 0.014492754 0.473429952 0.512077295
## Animation 0.266187050 0.661870504 0.043165468 0.028776978
## Comedy 0.001280410 0.090909091 0.450704225 0.457106274
## Drama 0.006024096 0.108433735 0.349397590 0.536144578
## Family 0.103225806 0.845161290 0.051612903 0.000000000
## Horror 0.000000000 0.004545455 0.272727273 0.722727273
## Musical/Music 0.000000000 0.067961165 0.553398058 0.378640777
## Mystery 0.000000000 0.000000000 0.179487179 0.820512821
## Romance 0.000000000 0.086956522 0.463768116 0.449275362
## SciFi/Fantasy 0.000000000 0.038759690 0.720930233 0.240310078
## Thriller 0.000000000 0.000000000 0.204419890 0.795580110
tapply
Calculate a statistic (e.g. mean, median, sd, IQR) for a quantitative variable across the categories of a categorical variable. The first argument should be the quantitative variable. The second argument should be the categorical variable. The third argument should be the name of the command that will calculate the desired statistic.
## G PG PG-13 R
## 90.80357 99.71901 108.02321 105.25547
## G PG PG-13 R
## 90 96 105 102
## G PG PG-13 R
## 14.63796 13.95487 17.58490 16.07108
cor
Calculate the correlation coefficient between two quantitative variables.
## [1] 0.5454737
Statistical Inference
qt
Calculate the t-value needed for a confidence interval. For a 95% confidence interval, the first argument should always be 0.975
. The second argument should be the appropriate degrees of freedom for the statistic and dataset.
## [1] 1.960524
pt
Calculate the p-value for a hypothesis test. The first argument should always be the negative version of the t-statistic and the second argument should be the appropriate degrees of freedom for the statistic and dataset.
## [1] 0.03578782
OLS Regression Models
lm
Run an OLS regression model. The first argument should always be a formula of the form dependent~independent1+independent2+...
. To simplify the writing of variable names, it is often useful to specify a second argument data
that identifies that dataset being used. Then you don’t have to include dataset_name$
in the formula. **Remember to always put the dependent (y) variable on the left hand side of the equation.
#simple model with one independent variable
model_simple <- lm(wages~age, data=earnings)
#same simple model but recenter age on 45 years of age
model_recenter <- lm(wages~I(age-45), data=earnings)
#a model with multiple independent variables, both quantitative and qualitative
model_multiple <- lm(wages~I(age-45)+education+race+gender+nchild, data=earnings)
#a model like the previous but also with interaction between gender and nchild
model_interaction <- lm(wages~I(age-45)+education+race+gender*nchild, data=earnings)
Once a model object is created, information can be extracted with either the coef
command which just reports the slopes and intercept, or a full summary
command which gives more information.
## (Intercept) I(age - 45) educationHS Diploma
## 17.3568021 0.2242916 4.5382688
## educationAA Degree educationBachelors Degree educationGraduate Degree
## 7.4288321 16.2657784 23.0187910
## raceBlack raceLatino raceAsian
## -3.4176245 -2.1133582 0.5641751
## raceIndigenous raceOther/Multiple genderFemale
## -1.5198248 -0.4331997 -4.3777137
## nchild genderFemale:nchild
## 1.2629571 -0.7490706
##
## Call:
## lm(formula = wages ~ I(age - 45) + education + race + gender *
## nchild, data = earnings)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.638 -7.779 -2.198 4.568 90.578
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.356802 0.159721 108.669 < 2e-16 ***
## I(age - 45) 0.224292 0.002858 78.471 < 2e-16 ***
## educationHS Diploma 4.538269 0.154451 29.383 < 2e-16 ***
## educationAA Degree 7.428832 0.181143 41.011 < 2e-16 ***
## educationBachelors Degree 16.265778 0.164396 98.943 < 2e-16 ***
## educationGraduate Degree 23.018791 0.178161 129.202 < 2e-16 ***
## raceBlack -3.417625 0.123798 -27.607 < 2e-16 ***
## raceLatino -2.113358 0.109491 -19.302 < 2e-16 ***
## raceAsian 0.564175 0.157602 3.580 0.000344 ***
## raceIndigenous -1.519825 0.321284 -4.730 2.24e-06 ***
## raceOther/Multiple -0.433200 0.303134 -1.429 0.152987
## genderFemale -4.377714 0.090203 -48.532 < 2e-16 ***
## nchild 1.262957 0.043476 29.049 < 2e-16 ***
## genderFemale:nchild -0.749071 0.063268 -11.840 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.74 on 145633 degrees of freedom
## Multiple R-squared: 0.284, Adjusted R-squared: 0.2839
## F-statistic: 4443 on 13 and 145633 DF, p-value: < 2.2e-16
Utility functions
round
Used for rounding the results of numbers to a given number of decimal places. By default, it will round to whole numbers, but you can specify the number of decimal places in the second argument.
##
## Action Animation Comedy Drama Family
## 8 5 31 13 6
## Horror Musical/Music Mystery Romance SciFi/Fantasy
## 9 4 2 5 10
## Thriller
## 7
sort
Sort a vector of numbers from smallest to largest (default), or largest to smallest (with additional argument decreasing=TRUE
).
##
## Comedy Drama SciFi/Fantasy Horror Action
## 31 13 10 9 8
## Thriller Family Animation Romance Musical/Music
## 7 6 5 5 4
## Mystery
## 2
##
## Mystery Musical/Music Animation Romance Family
## 2 4 5 5 6
## Thriller Action Horror SciFi/Fantasy Drama
## 7 8 9 10 13
## Comedy
## 31