Preface
Understanding Data
What Does Data Look Like?
The observations
The variables
What Can We Do With Data?
How is a variable distributed?
Measuring association
Making statistical inferences
Building Models
Observational Data, Experimental Thinking
The Distribution of a Variable
Looking at Distributions
Looking at the distribution of a categorical variable
Looking at the distribution of a quantitative variable
Measuring the Center of a Distribution
The mean
The median
The mode
Comparing the mean and median
Percentiles and the Five Number Summary
Percentiles
The five number summary
Boxplots
Measuring the Spread of a Distribution
Range and interquartile range
Variance and standard deviation
Measuring Association
The Two-Way Table
Conditional distributions
Odds ratio (advanced)
Mean Differences
Graphically examining differences in distributions
Comparing differences in the mean
Scatterplot and Correlation Coefficient
The scatterplot
The correlation coefficient
Statistical Inference
The Problem of Statistical Inference
The Concept of the Sampling Distribution
Example: class height
Central limit theorem and the normal distribution
What can we do with the sampling distribution?
Confidence Intervals
What do we mean by “confident?”
Calculating the confidence interval for the sample mean
Calculating the confidence interval for other sample statistics
Example with proportions
Example with mean differences
Example with proportion differences
Example with correlation coefficient
Hypothesis Tests
Example: Coke winners
The general procedure of hypothesis testing
Hypothesis tests of relationships
Building Models
The OLS Regression Line
The Formula for a Line
Calculating the Best-Fitting Line
Using the
lm
command to calculate OLS regression lines in
R
Adding an OLS regression line to a plot
The OLS regression line as a model
Interpeting Slopes and Intercepts
How good is
\(x\)
as a predictor of
\(y\)
?
Inference for OLS Regression models
Regression Line Cautions
The Power of Controlling for Other Variables
Interpreting results in a multivariate OLS regression models
Including more than two independent variables
How to read a table of regression results
Including Categorical Variables as Predictors
Indicator variables
Categorical variables with more than two categories
Categorical and quantitative variables combined in a single model
Interaction Terms
The nature of additive models
The interaction term
Interpreting interaction terms
Interaction terms in
R
Interaction terms with multiple categories
Interaction terms with two categorical variables
Model Complications
The Linear Model, Revisited
Reformulating the linear model
Marginal effects
Linear model assumptions
Estimating a linear model
Modeling Non-Linearity
Smoothing
Residual Plots
Transformations
Polynomial Models
Splines
The IID Violation and Robust Standard Errors
Violations of independence
Heteroscedasticity
Fixing IID violations
Weighted least squares
Robust standard errors
Sample Design and Weighting
Cluster/multistage sampling
Stratification
Weights
Correcting for sample design in models
Missing Values
Identifying Valid Skips
Kinds of Missingness
Removing Cases
Imputation
Multicollinearity and Scale Creation
Avoid the Singularity
Detecting Data-Based Multicollinearity
Addressing Multicollinearity
Creating Scales
Factor Analysis
Model Selection
There is no “right” model
The accuracy vs. parsimony tradeoff
Null vs. saturated model
A not-very-useful tool: the F-test
Tools with a parsimony penalty
Model Averaging
Modeling Categorical Outcomes
Dichotomous Outcomes and The Binomial Distribution
Linear Probability Model
Generalized Linear Model
Maximum Likelihood Estimation
Logistic Regression Model
Models for Nominal Polytomous Outcomes
Models for Ordinal Polytomous Outcomes
Appendices
Useful References
Example Datasets
Crimes
Movies
Politics
Popularity
Sex
Titanic
Wages
Common R Commands
Univariate Statistics
Bivariate Statistics
Statistical Inference
OLS Regression Models
Utility functions
Plotting Cookbook
Barplots
Histograms
Boxplots
Comparative Barplots
Comparative Boxplots
Scatterplots
R Stat Lab
Using Scripts
Getting Started with Scripts
Not Everything Goes Into Your Script
Commenting for Sanity
Object Types
Atomic Modes
Vectors and Matrices
Factors
Logical Values and Boolean Statements
Missing Values
Lists
Data Frames
Pretty Pictures
Base Plot
ggplot
Using Git
Plain Text is Better
Git
Reading and Writing Data
Data Formats
Plain text files
Data in binary format
Saving data
Cleaning Data
The Most Important Rule:
Check yourself before you wreck yourself
Assigning missing values
Recoding
Collapsing Categorical Variables
Transforming Quantitative Variables
After Cleaning You Still Need to Tidy
Aggregating Data
Reshaping and Merging Data
Reshaping
Merging data
Programming
An Example: Theil’s H
Our Data
Calculating Theil’s H for a single state
Creating Functions
Iteration
Putting It All Together
Using R Markdown
Plain Text Science
Markdown Syntax
R Markdown
Figures in R Markdown
Statistical Analysis in Sociology
Models for Nominal Polytomous Outcomes