Common R Commands

Below is a list of common commands that we use for the undergraduate class, along with some examples. You can view a help file in RStudio for each command by searching for the command name in the help tab in the lower right panel. You can also just type the name of the command preceded by a “?” into the console. For example, if you wanted to understand how barplot works, type:

The list here does not contain information about making plots in R. That information is in the Plotting Cookbook appendix.

Univariate Statistics

mean

Calculate the mean of a quantitative variable. Remember that this command will not work for categorical variables.

## [1] 24.27601

median

Calculate the median of a quantitative variable. Remember that this command will not work for categorical variables.

## [1] 19.21667

sd

Calculate the standard deviation of a quantitative variable. Remember that this command will not work for categorical variables.

## [1] 16.23676

IQR

Calculate the interquartile range of a quantitative variable. Remember that this command will not work for categorical variables.

## [1] 17

quantile

Calculate percentiles of a distribution. Remember that this command will not work for categorical variables. By default, the quantile command will return the quartiles (0,25,50,75,100 percentiles). If you want different percentiles, you will have to specify the probs argument.

##       0%      25%      50%      75%     100% 
##  1.00000 13.00000 19.21667 30.00000 99.99000
##    10%    90% 
## 10.000 47.596

table

Calculate the absolute frequencies of the categories a categorical variable.

## 
##        Action     Animation        Comedy         Drama        Family 
##           207           139           781           332           155 
##        Horror Musical/Music       Mystery       Romance SciFi/Fantasy 
##           220           103            39           138           258 
##      Thriller 
##           181

prop.table

Calculate the proportions (i.e. relative frequencies) of the categories of a categorical variable. This command must be run on the output from a table command. You can do that in one command by nesting the table command inside the prop.table command.

## 
##        Action     Animation        Comedy         Drama        Family 
##    0.08108108    0.05444575    0.30591461    0.13004309    0.06071289 
##        Horror Musical/Music       Mystery       Romance SciFi/Fantasy 
##    0.08617313    0.04034469    0.01527615    0.05405405    0.10105758 
##      Thriller 
##    0.07089698

summary

Provide a summary of a variable, either categorical or quantitative.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   13.00   19.22   24.28   30.00   99.99
##        Action     Animation        Comedy         Drama        Family 
##           207           139           781           332           155 
##        Horror Musical/Music       Mystery       Romance SciFi/Fantasy 
##           220           103            39           138           258 
##      Thriller 
##           181

Bivariate Statistics

table

Can be used to create a two-way table, although further work needs to be done to extract useful information from the two-way table.

##                
##                   G  PG PG-13   R
##   Action          0   3    98 106
##   Animation      37  92     6   4
##   Comedy          1  71   352 357
##   Drama           2  36   116 178
##   Family         16 131     8   0
##   Horror          0   1    60 159
##   Musical/Music   0   7    57  39
##   Mystery         0   0     7  32
##   Romance         0  12    64  62
##   SciFi/Fantasy   0  10   186  62
##   Thriller        0   0    37 144

prop.table

Calculate the conditional distributions from a two-way table. The first argument here must be a two-way table output from the table command. It is very important that you also add a second argument that indicated the way you want the conditional distributions. 1 will give you distributions conditional on the row variable and 2 will give you distributions conditional on the column variable.

##                
##                           G          PG       PG-13           R
##   Action        0.000000000 0.014492754 0.473429952 0.512077295
##   Animation     0.266187050 0.661870504 0.043165468 0.028776978
##   Comedy        0.001280410 0.090909091 0.450704225 0.457106274
##   Drama         0.006024096 0.108433735 0.349397590 0.536144578
##   Family        0.103225806 0.845161290 0.051612903 0.000000000
##   Horror        0.000000000 0.004545455 0.272727273 0.722727273
##   Musical/Music 0.000000000 0.067961165 0.553398058 0.378640777
##   Mystery       0.000000000 0.000000000 0.179487179 0.820512821
##   Romance       0.000000000 0.086956522 0.463768116 0.449275362
##   SciFi/Fantasy 0.000000000 0.038759690 0.720930233 0.240310078
##   Thriller      0.000000000 0.000000000 0.204419890 0.795580110

tapply

Calculate a statistic (e.g. mean, median, sd, IQR) for a quantitative variable across the categories of a categorical variable. The first argument should be the quantitative variable. The second argument should be the categorical variable. The third argument should be the name of the command that will calculate the desired statistic.

##         G        PG     PG-13         R 
##  90.80357  99.71901 108.02321 105.25547
##     G    PG PG-13     R 
##    90    96   105   102
##        G       PG    PG-13        R 
## 14.63796 13.95487 17.58490 16.07108

cor

Calculate the correlation coefficient between two quantitative variables.

## [1] 0.5454737

Statistical Inference

nrow

Return the number of observations in a dataset.

## [1] 51

qt

Calculate the t-value needed for a confidence interval. For a 95% confidence interval, the first argument should always be 0.975. The second argument should be the appropriate degrees of freedom for the statistic and dataset.

## [1] 1.960524

pt

Calculate the p-value for a hypothesis test. The first argument should always be the negative version of the t-statistic and the second argument should be the appropriate degrees of freedom for the statistic and dataset.

## [1] 0.03578782

OLS Regression Models

lm

Run an OLS regression model. The first argument should always be a formula of the form dependent~independent1+independent2+.... To simplify the writing of variable names, it is often useful to specify a second argument data that identifies that dataset being used. Then you don’t have to include dataset_name$ in the formula. **Remember to always put the dependent (y) variable on the left hand side of the equation.

Once a model object is created, information can be extracted with either the coef command which just reports the slopes and intercept, or a full summary command which gives more information.

##               (Intercept)               I(age - 45)       educationHS Diploma 
##                17.3568021                 0.2242916                 4.5382688 
##        educationAA Degree educationBachelors Degree  educationGraduate Degree 
##                 7.4288321                16.2657784                23.0187910 
##                 raceBlack                raceLatino                 raceAsian 
##                -3.4176245                -2.1133582                 0.5641751 
##            raceIndigenous        raceOther/Multiple              genderFemale 
##                -1.5198248                -0.4331997                -4.3777137 
##                    nchild       genderFemale:nchild 
##                 1.2629571                -0.7490706
## 
## Call:
## lm(formula = wages ~ I(age - 45) + education + race + gender * 
##     nchild, data = earnings)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -43.638  -7.779  -2.198   4.568  90.578 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               17.356802   0.159721 108.669  < 2e-16 ***
## I(age - 45)                0.224292   0.002858  78.471  < 2e-16 ***
## educationHS Diploma        4.538269   0.154451  29.383  < 2e-16 ***
## educationAA Degree         7.428832   0.181143  41.011  < 2e-16 ***
## educationBachelors Degree 16.265778   0.164396  98.943  < 2e-16 ***
## educationGraduate Degree  23.018791   0.178161 129.202  < 2e-16 ***
## raceBlack                 -3.417625   0.123798 -27.607  < 2e-16 ***
## raceLatino                -2.113358   0.109491 -19.302  < 2e-16 ***
## raceAsian                  0.564175   0.157602   3.580 0.000344 ***
## raceIndigenous            -1.519825   0.321284  -4.730 2.24e-06 ***
## raceOther/Multiple        -0.433200   0.303134  -1.429 0.152987    
## genderFemale              -4.377714   0.090203 -48.532  < 2e-16 ***
## nchild                     1.262957   0.043476  29.049  < 2e-16 ***
## genderFemale:nchild       -0.749071   0.063268 -11.840  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.74 on 145633 degrees of freedom
## Multiple R-squared:  0.284,  Adjusted R-squared:  0.2839 
## F-statistic:  4443 on 13 and 145633 DF,  p-value: < 2.2e-16

Utility functions

round

Used for rounding the results of numbers to a given number of decimal places. By default, it will round to whole numbers, but you can specify the number of decimal places in the second argument.

## 
##        Action     Animation        Comedy         Drama        Family 
##             8             5            31            13             6 
##        Horror Musical/Music       Mystery       Romance SciFi/Fantasy 
##             9             4             2             5            10 
##      Thriller 
##             7

sort

Sort a vector of numbers from smallest to largest (default), or largest to smallest (with additional argument decreasing=TRUE).

## 
##        Comedy         Drama SciFi/Fantasy        Horror        Action 
##            31            13            10             9             8 
##      Thriller        Family     Animation       Romance Musical/Music 
##             7             6             5             5             4 
##       Mystery 
##             2
## 
##       Mystery Musical/Music     Animation       Romance        Family 
##             2             4             5             5             6 
##      Thriller        Action        Horror SciFi/Fantasy         Drama 
##             7             8             9            10            13 
##        Comedy 
##            31