计算例如多列数据列表中的均值 [英] Calculate e.g. a mean in a list with multi-column data.frames
问题描述
我有几个data.frames的列表.每个data.frame有几列.通过使用平均值(mylist $ first_dataframe $ a
我可以在此data.frame中获得a的均值.但是我不知道如何计算列表中存储的所有data.frames或如何计算特定的data.frames.
I have a list of several data.frames. Each data.frame has several columns.
By using
mean(mylist$first_dataframe$a
I can get the mean for a in this one data.frame.
However I do not know how to calculate over all the data.frames stored in my list or how for specific data.frames.
我可以使用循环,但被告知 apply()
及其变体更好我尝试使用通过搜索找到的几种解决方案,但是以某种方式无法正常工作.我认为我需要使用
I could use a loop but I was told that
apply()
and its variations are better
I tried using several solutions I found via search but somehow it just doesn't work.
I assume I need to use
unlist()
您能否提供一个示例,例如如何计算像我的数据结构的意思.包含多个data.frames的列表,其中包含几列.
Could you provide an example of how to calculate e.g. a mean for a data structure like mine. A list with several data.frames containing several columns.
更新:对不起,我很困惑.我想要所有数据框中的特定列的平均值.感谢Thomas提供的工作解决方案来计算所有数据帧中特定列的均值,并感谢Psychometriko提供了有用的解决方案来计算所有数据帧中所有列的均值(&即使在不涉及数字数据的情况下也是如此)).
Update: I'm sorry for the confusion. I wanted the grand mean for a specific column in all dataframes. Thanks to Thomas for providing a working solution for calculating a grand mean for a specific column in all dataframes and to psychometriko for providing a useful solution for calculating means over all columns in all dataframes (& even for the case when not numeric data is involved).
谢谢!
推荐答案
这是您要寻找的吗?
set.seed(42)
mylist <- list(a=data.frame(foo=rnorm(10),
bar=rnorm(10)),
b=data.frame(foo=rnorm(10),
bar=rnorm(10)),
c=data.frame(foo=rnorm(10),
bar=rnorm(10)))
sapply(do.call("rbind",mylist),mean)
foo bar
0.1163340 -0.1696556
注意: do.call("rbind",mylist)
返回的内容与您上面使用 unlist
函数引用的内容类似,然后返回 sapply
,如Roland在他的回答中所提到的,仅对由上述 do.call
mean >功能.
Note: do.call("rbind",mylist)
returns something similar to what you referred to above with the unlist
function, and then sapply
, as referred to by Roland in his answer, just calls the function mean
on each component (column) of the data.frame that results from the above do.call
function.
编辑:针对如何处理非数字data.frame组件的问题,下面的解决方案虽然不是很好,但我敢肯定有更好的解决方案,但这是我能想到的第一件事:
Edit: In response to the question of how to deal with non-numeric data.frame components, the below solution admittedly isn't very elegant and I'm sure better ones exist, but here's the first thing I was able to think of:
set.seed(42)
mylist <- list(a=data.frame(rand=rnorm(10),
lets=sample(LETTERS,10,replace=TRUE)),
b=data.frame(rand=rnorm(10),
lets=sample(LETTERS,10,replace=TRUE)),
c=data.frame(rand=rnorm(10),
lets=sample(LETTERS,10,replace=TRUE)))
sapply(do.call("rbind",mylist),function(x) {
if (is.numeric(x)) mean(x)
})
$rand
[1] -0.02470602
$lets
NULL
这基本上只是创建一个自定义函数,该函数首先测试每个分量是否为数字,如果是,则返回均值.如果不是,它将跳过它.
This basically just creates a custom function that first tests whether each component is numeric and, if it is, returns the mean. If it isn't, it skips it.
这篇关于计算例如多列数据列表中的均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!