有条件地应用函数 [英] Apply function conditionally
问题描述
我有一个这样的数据框:
I have a dataframe like this:
experiment iter results
A 1 30.0
A 2 23.0
A 3 33.3
B 1 313.0
B 2 323.0
B 3 350.0
....
有没有办法通过应用带有条件的函数来统计结果.在上面的示例中,该条件是特定实验的所有迭代.
Is there a way to tally results by applying a function with conditions. In the above example, that condition is all iterations of a particular experiment.
A sum of results (30 + 23, + 33.3)
B sum of results (313 + 323 + 350)
我正在考虑应用"功能,但找不到让它工作的方法.
I am thinking of "apply" function, but can't find a way to get it work.
推荐答案
有很多替代方案可以做到这一点.请注意,如果您对不同于 sum
的另一个函数感兴趣,那么只需更改参数 FUN=any.function
,例如,如果您想要 mean
>、var
length
等,然后将这些函数插入到 FUN
参数中,例如,FUN=mean
,FUN=var
等等.让我们探索一些替代方案:
There are a lot of alternatives to do this. Note that if you are interested in another function different from sum
, then just change the argument FUN=any.function
, e.g, if you want mean
, var
length
, etc, then just plug those functions into FUN
argument, e.g, FUN=mean
, FUN=var
and so on. Let's explore some alternatives:
aggregate
函数在 base 中.
aggregate
function in base.
> aggregate(results ~ experiment, FUN=sum, data=DF)
experiment results
1 A 86.3
2 B 986.0
<小时>
或者也许 tapply
?
> with(DF, tapply(results, experiment, FUN=sum))
A B
86.3 986.0
<小时>
还有来自plyr包的ddply
> # library(plyr)
> ddply(DF[, -2], .(experiment), numcolwise(sum))
experiment results
1 A 86.3
2 B 986.0
> ## Alternative syntax
> ddply(DF, .(experiment), summarize, sumResults = sum(results))
experiment sumResults
1 A 86.3
2 B 986.0
<小时>
还有 dplyr
包
> require(dplyr)
> DF %>% group_by(experiment) %>% summarise(sumResults = sum(results))
Source: local data frame [2 x 2]
experiment sumResults
1 A 86.3
2 B 986.0
<小时>
使用sapply
和split
,相当于tapply
.
> with(DF, sapply(split(results, experiment), sum))
A B
86.3 986.0
<小时>
如果您担心时间,data.table
是您的朋友:
> # library(data.table)
> DT <- data.table(DF)
> DT[, sum(results), by=experiment]
experiment V1
1: A 86.3
2: B 986.0
<小时>
不太流行,但是 doBy 包很好(相当于 aggregate
,即使在语法上!)
> # library(doBy)
> summaryBy(results~experiment, FUN=sum, data=DF)
experiment results.sum
1 A 86.3
2 B 986.0
<小时>
在这种情况下by
也有帮助
> (Aggregate.sums <- with(DF, by(results, experiment, sum)))
experiment: A
[1] 86.3
-------------------------------------------------------------------------
experiment: B
[1] 986
如果你希望结果是一个矩阵,那么使用 cbind
或 rbind
If you want the result to be a matrix then use either cbind
or rbind
> cbind(results=Aggregate.sums)
results
A 86.3
B 986.0
<小时>
sqldf
来自 sqldf 包也可能是一个不错的选择
sqldf
from sqldf package also could be a good option
> library(sqldf)
> sqldf("select experiment, sum(results) `sum.results`
from DF group by experiment")
experiment sum.results
1 A 86.3
2 B 986.0
<小时>
xtabs
也有效(仅当 FUN=sum
时)
xtabs
also works (only when FUN=sum
)
> xtabs(results ~ experiment, data=DF)
experiment
A B
86.3 986.0
这篇关于有条件地应用函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!