Calculating means by categorical variables (factors) in R

# These are some of the different ways to do the calculation that social scientists probably
# do most frequently; Calculate means for different groups or conditions
# input data into R (the data is on some reading camps run in Ghana)
readincamp = read.csv(“competitivereadingcamp.csv”)

# Tell R to assume readincamp is dataset from now until detach
attach(readincamp )

# Find the means of score1 initial scores, by gender and type of reading camp
summaryBy(score1  ~ female+competitive, data=readincamp , FUN=c(mean),na.rm=TRUE)
summaryBy(score1  ~ female+africanbooks, data=readincamp , FUN=c(mean),na.rm=TRUE)
# Another way to do it
print(tapply(X=score1, INDEX=list(africanbooks, female), FUN=mean , na.rm = TRUE))
print(tapply(X=score1, INDEX=list(competitive, female), FUN=mean , na.rm = TRUE))

# Another way: Summarize data with summarySE command

# Source: the excellent http://www.cookbook-r.com/Manipulating_data/Summarizing_data/
# To do this have to install.packages(“bear”)
# summarySE gives count, mean, standard deviation, standard error of the mean, and confidence interval (default 95%).
# rc2 will be a new data frame.
# measurevar: the name of a column that contains the variable to be summariezed
# groupvars: a vector containing names of columns that contain grouping variables
# na.rm: a boolean option that indicates whether to ignore NA’s – missing values
# conf.interval: the percent range of the confidence interval (default is 95%)
install.packages(“bear”)
library(bear)
rc2

About mkevane

Economist at Santa Clara University and Director of Friends of African Village Libraries.
This entry was posted in Politics. Bookmark the permalink.