(two sample t-test for groups in data frame here)
First we create a data frame, on which we can work
- x - the data frame
- var - the variable which should be testet
- mean - the mean of the t-test
now we can use ddply in the following way:
First we create a data frame, on which we can work
x <- rnorm(100) y <- sample(1:3, 100, replace=T) z <- sample(1:2, 100, replace=T) df <- data.frame(group=y,sex=z,sds=x) head(df)
group sex sds 1 1 2 1.3412663 2 1 1 0.8000326 3 2 2 0.0371029 4 1 2 0.2064684 5 1 2 1.6429816 6 3 1 1.3271138In the following I use ddply() from the plyr-package, so if you did not install and/or load it you have to do it now - the command library() without any argument gives you a list of the installed packages:
install.packages("plyr") # install the package library(plyr) # load itHere is the function for doing the t.tests:
t.test.plyr <- function(x, var, mean=0 ){ y <- rep(NA,10) y[6] <- nrow(x)[1] # count observations if(nrow(x) < 2) return(y) # exits if too less observations res <- t.test(x[var], mu=mean) # doing the test y[1] <- res$statistic # extract values of interest y[2] <- res$p.value y[3] <- res$estimate y[4] <- res$conf.int[1] y[5] <- res$conf.int[2] y[7] <- res$parameter y[8] <- res$method y[9] <- res$alternative y[10] <- res$null.value names(y) <- c("statistic","p.value","estimate","conf.int1", "conf.int2", "nobs","dof","method","alternative","null.value") y }where t.test.plyr() is a function of:
- x - the data frame
- var - the variable which should be testet
- mean - the mean of the t-test
now we can use ddply in the following way:
result <- ddply(df, .(group, sex), t.test.plyr, "sds") result
group sex statistic p.value estimate 1 1 1 0.774917157596946 0.445379429071851 0.132214420988956 2 1 2 -0.415594591752269 0.68359099812399 -0.0987341821354776 3 2 1 0.130257003222609 0.899578855830346 0.0602600950757679 4 2 2 -0.0579045286518569 0.954965406753824 -0.0150105343694587 5 3 1 1.18044920537142 0.256202807853187 0.406567809727671 6 3 2 1.49215606704641 0.15126489039304 0.294865619363907 conf.int1 conf.int2 nobs dof method alternative 1 -0.218494854049804 0.482923696027717 27 26 One Sample t-test two.sided 2 -0.605109702463049 0.407641338192093 16 15 One Sample t-test two.sided 3 -1.00655416438508 1.12707435453662 9 8 One Sample t-test two.sided 4 -0.592608791299465 0.562587722560548 11 10 One Sample t-test two.sided 5 -0.327541518602652 1.14067713805799 16 15 One Sample t-test two.sided 6 -0.117342538637965 0.70707377736578 21 20 One Sample t-test two.sided null.value 1 0 2 0 3 0 4 0 5 0 6 0If you want to change the mu in you t.test, add the argument mean:
result <- ddply(df, .(group, sex), t.test.plyr, "sds", mean=1) resultIf there is no need or you do not want a data frame as result you can just use tapply(). tapply results in a list with as many elements as subsets, each element contains the results of each t.test (as list).
res <- tapply(df$sds, list(df$sex,df$group), t.test) res
1 2 3 1 List,9 List,9 List,9 2 List,9 List,9 List,9So the first element of res contains the t.test results for sex=1 and group=1 etc pp
res[[1]]
One Sample t-test data: X[[1L]] t = 0.7749, df = 26, p-value = 0.4454 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -0.2184949 0.4829237 sample estimates: mean of x 0.1322144
What you are doing here is basically testing if any value has a mean significantly different from zero, right? Would it not be more interesting to test the groups against each other?
ReplyDeleteActually, that's what I wanted to code when your blog showed up as a search result.
do you mean something like this?
Delete> xx <- data.frame(vals=rnorm(100),group=sample(letters[1:2],100,replace = T))
> t.test(xx$vals~xx$group)
Welch Two Sample t-test
data: xx$vals by xx$group
t = 0,6923, df = 84,572, p-value = 0,4906
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0,2801770 0,5794806
sample estimates:
mean in group a mean in group b
0,02413266 -0,12551915
yes, i was looking for the same thing. do you have an idea how to implement this? i have been trying to add an addition variable like this
ReplyDeletet.test.plyr <- function(x, var,group){
y <- rep(NA,10)
y[6] <- nrow(x)[1] # count observations
if(nrow(x) < 2) return(y) # exits if too less observations
res <- t.test(x[var]~x[group]) # doing the test
y[1] <- res$statistic # extract values of interest
y[2] <- res$p.value
y[3] <- res$estimate
y[4] <- res$conf.int[1]
y[5] <- res$conf.int[2]
y[7] <- res$parameter
y[8] <- res$method
y[9] <- res$alternative
y[10] <- res$null.value
names(y) <- c("statistic","p.value","estimate","conf.int1", "conf.int2", "nobs","dof","method","alternative","null.value")
y
}
result <- ddply(df, .(group), t.test.plyr, "sds", "sex")
but its giving me
Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In mean.default(x) : argument is not numeric or logical: returning NA
2: In var(x) :
Calling var(x) on a factor x is deprecated and will become an error.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
3: In mean.default(y) : argument is not numeric or logical: returning NA
4: In var(y) :
Calling var(x) on a factor x is deprecated and will become an error.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
Called from: t.test.default(x = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), y = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L))
Any Ideas?