计算例如多列数据列表中的均值 [英] Calculate e.g. a mean in a list with multi-column data.frames

查看:34
本文介绍了计算例如多列数据列表中的均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个data.frames的列表.每个data.frame有几列.通过使用平均值(mylist $ first_dataframe $ a 我可以在此data.frame中获得a的均值.但是我不知道如何计算列表中存储的所有data.frames或如何计算特定的data.frames.

I have a list of several data.frames. Each data.frame has several columns. By using mean(mylist$first_dataframe$a I can get the mean for a in this one data.frame. However I do not know how to calculate over all the data.frames stored in my list or how for specific data.frames.

我可以使用循环,但被告知 apply()及其变体更好我尝试使用通过搜索找到的几种解决方案,但是以某种方式无法正常工作.我认为我需要使用

I could use a loop but I was told that apply() and its variations are better I tried using several solutions I found via search but somehow it just doesn't work. I assume I need to use

unlist()

您能否提供一个示例,例如如何计算像我的数据结构的意思.包含多个data.frames的列表,其中包含几列.

Could you provide an example of how to calculate e.g. a mean for a data structure like mine. A list with several data.frames containing several columns.

更新:对不起,我很困惑.我想要所有数据框中的特定列的平均值.感谢Thomas提供的工作解决方案来计算所有数据帧中特定列的均值,并感谢Psychometriko提供了有用的解决方案来计算所有数据帧中所有列的均值(&即使在不涉及数字数据的情况下也是如此)).

Update: I'm sorry for the confusion. I wanted the grand mean for a specific column in all dataframes. Thanks to Thomas for providing a working solution for calculating a grand mean for a specific column in all dataframes and to psychometriko for providing a useful solution for calculating means over all columns in all dataframes (& even for the case when not numeric data is involved).

谢谢!

推荐答案

这是您要寻找的吗?

set.seed(42)
mylist <- list(a=data.frame(foo=rnorm(10),
                            bar=rnorm(10)),
               b=data.frame(foo=rnorm(10),
                            bar=rnorm(10)),
               c=data.frame(foo=rnorm(10),
                            bar=rnorm(10)))
sapply(do.call("rbind",mylist),mean)

       foo        bar 
 0.1163340 -0.1696556 

注意: do.call("rbind",mylist)返回的内容与您上面使用 unlist 函数引用的内容类似,然后返回 sapply ,如Roland在他的回答中所提到的,仅对由上述 do.call mean >功能.

Note: do.call("rbind",mylist) returns something similar to what you referred to above with the unlist function, and then sapply, as referred to by Roland in his answer, just calls the function mean on each component (column) of the data.frame that results from the above do.call function.

编辑:针对如何处理非数字data.frame组件的问题,下面的解决方案虽然不是很好,但我敢肯定有更好的解决方案,但这是我能想到的第一件事:

Edit: In response to the question of how to deal with non-numeric data.frame components, the below solution admittedly isn't very elegant and I'm sure better ones exist, but here's the first thing I was able to think of:

set.seed(42)
mylist <- list(a=data.frame(rand=rnorm(10),
                            lets=sample(LETTERS,10,replace=TRUE)),
               b=data.frame(rand=rnorm(10),
                            lets=sample(LETTERS,10,replace=TRUE)),
               c=data.frame(rand=rnorm(10),
                            lets=sample(LETTERS,10,replace=TRUE)))
sapply(do.call("rbind",mylist),function(x) {
  if (is.numeric(x)) mean(x)
})

$rand
[1] -0.02470602

$lets
NULL

这基本上只是创建一个自定义函数,该函数首先测试每个分量是否为数字,如果是,则返回均值.如果不是,它将跳过它.

This basically just creates a custom function that first tests whether each component is numeric and, if it is, returns the mean. If it isn't, it skips it.

这篇关于计算例如多列数据列表中的均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆