Creating bar plots and box plots of variable by several categories (factors) in R

# input data into R
readincamp = read.csv(“competitivereadingcamp.csv”)

# Tell R to assume readincamp is dataset from now until detach
attach(readincamp )

# Summarize data with summarySE command
# Source; the excellent http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_%28ggplot2%29/
# To do this have to install.packages(“bear”)
# summarySE gives count, mean, standard deviation, standard error of the mean, and confidence interval (default 95%).
# rc2 will be a new data frame.
# measurevar: the name of a column that contains the variable to be summariezed
# groupvars: a vector containing names of columns that contain grouping variables
# na.rm: a boolean option that indicates whether to ignore NA’s – missing values
# conf.interval: the percent range of the confidence interval (default is 95%)
install.packages(“bear”)
library(bear)
rc2 <- summarySE(readincamp, measurevar=”score1″, groupvars=c(“treatment”, “female”))

# In the new data frame rc2, make treatment and female into factor rather than numeric variables
rc$treatment2 <- factor(rc$treatment)
rc2$female2 <- factor(rc$female)

# Now use ggplot to make the bar plot
# need install.packages(“ggplot2”) and then library(“ggplot2″)
# Error bars represent standard error (se) or confidence interval (ci) of the mean
ggplot(rc2, aes(x=treatment2, y=score1, fill=female2)) +
geom_bar(position=position_dodge(), stat=”identity”)  + # Thinner lines
geom_errorbar(aes(ymin=score1-ci, ymax=score1+ci),
size=.3,    # Thinner lines
width=.2,
position=position_dodge(.9)) +
xlab(“Treatment”) +
ylab(“Score 1″) +
scale_fill_hue(name=”Gender”, # Legend label
breaks=c(“0”, “1”),
labels=c(“Male”, “Female”)) +
ggtitle(“The Effect of Treatment on Test Score1 (with confidence intervals)”) +
scale_y_continuous(breaks=0:20*2) + #control ticks on y axis
theme_bw() #make  background white

# make a boxplot of the means
# Here is some great help http://www.r-bloggers.com/box-plot-with-r-tutorial/
boxplot(score1  ~ female*competitive, main=”Scores on reading test”,
xlab=””, ylab=”Score 1″,
col=(c(“white”,”gray”)), las = 2,
at =c(1,2,4,5),par(mar = c(8, 5, 4, 2)+ 0.1),
names = c(“men, noncomp”,”men, comp”,”women, noncom”,”women, comp”))

About mkevane

Economist at Santa Clara University and Director of Friends of African Village Libraries.
This entry was posted in Politics. Bookmark the permalink.