R:在数据框架中处理连接和平均值的最佳功能是什么? [英] R: What are the best functions to deal with concatenating and averaging values in a data.frame?
问题描述
我有一个这个代码的数据框架:
I have a data.frame from this code:
my_df = data.frame("read_time" = c("2010-02-15", "2010-02-15",
"2010-02-16", "2010-02-16",
"2010-02-16", "2010-02-17"),
"OD" = c(0.1, 0.2, 0.1, 0.2, 0.4, 0.5) )
产生这一点:
> my_df
read_time OD
1 2010-02-15 0.1
2 2010-02-15 0.2
3 2010-02-16 0.1
4 2010-02-16 0.2
5 2010-02-16 0.4
6 2010-02-17 0.5
我想平均每个不同read_time的OD列(请注意,有些是不可复制的),我也想计算标准偏差,生成一个这样的表:
I want to average the OD column over each distinct read_time (notice some are replicated others are not) and I also would like to calculate the standard deviation, producing a table like this:
> my_df
read_time OD stdev
1 2010-02-15 0.15 0.05
5 2010-02-16 0.3 0.1
6 2010-02-17 0.5 0
哪些是处理在数据框架中连接这些值的最佳功能?
Which are the best functions to deal with concatenating such values in a data.frame?
推荐答案
plyr 包是受欢迎的,但基础函数 by()
和 aggregate()
将也有帮助。
> ddply(my_df, "read_time", function(X) data.frame(OD=mean(X$OD),stdev=sd(X$OD)))
read_time OD stdev
1 2010-02-15 0.15000 0.07071
2 2010-02-16 0.23333 0.15275
3 2010-02-17 0.50000 NA
您可以添加缺少的位以返回0而不是最后一个std.dev的NA。
You can add the missing bit to return 0 instead of NA for the last std.dev.
此外,您不需要在data.frame构造中使用的引号(在变量上)。
Also, you don't need the quotes (on the variables) you had in the data.frame construction.
这篇关于R:在数据框架中处理连接和平均值的最佳功能是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!