组协方差矩阵 [英] covariance matrix by group

查看:97
本文介绍了组协方差矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经能够使用以下方法为我的大型数据集计算协方差:

I have been able to calculate covariance for my large data set with:

cov(MyMatrix,use ="pairwise.complete.obs",method ="pearson")

cov(MyMatrix, use="pairwise.complete.obs",method="pearson")

这提供了我一直在寻找的协方差表,以及处理整个数据中的NA问题.但是,为了进行更深入的分析,我想创建协方差矩阵,分别处理我在数据集中拥有的800多个组(有些具有40多个观测值,另一些只有1个观测值).我尝试过(来自 http://www.mail-archive .com/r-help @ r-project.org/msg86328.html ):

This provided the covariance table I was looking for, as well as dealing with the NA issues that are throughout my data. For a deeper analysis, however, I want to create covariance matrices that deal separately with the 800+ groups I have in my data set (some have 40+ observations, others only 1). I tried (from http://www.mail-archive.com/r-help@r-project.org/msg86328.html):

lapply(list(cov),by,数据= MyMatrix [8:13],INDICES = MyMatrix ["Group"])

lapply(list(cov), by, data = MyMatrix[8:13], INDICES = MyMatrix["Group"])

哪个给了我以下错误:

tapply(seq_len(6L),list(MyMatrix["Group"] = NA_real_),函数(x)中的错误: 参数必须具有相同的长度

Error in tapply(seq_len(6L), list(MyMatrix["Group"] = NA_real_), function (x) : arguments must have same length

这使我认为代码问题涉及丢失的NA数据,因此我尝试将"use =" pairwise.complete.obs,method =" pearson"短语合并到糟糕的代码中,无法获取它的工作.我不确定最适合的位置,所以我尝试将其粘贴到任何地方:

This made me think the issue with the code involved the missing NA data, so I tried incorporating the "use="pairwise.complete.obs",method="pearson"" phrase into the lapply code and can't get it to work. I'm not sure the best place for it, so I tried sticking it everywhere:

lapply(list(cov),use ="pairwise.complete.obs",method ="pearson"),by,data = MyMatrix [8:13],INDICES = MyMatrix ["Group"])

lapply(list(cov), use="pairwise.complete.obs",method="pearson"),by,data=MyMatrix[8:13], INDICES = MyMatrix["Group"])

lapply(list(cov),by,data = PhenoMtrix [8:13],INDICES = PhenoMtrix ["Group"],use ="pairwise.complete.obs",method ="pearson")

lapply(list(cov),by,data=PhenoMtrix[8:13], INDICES = PhenoMtrix["Group"], use="pairwise.complete.obs",method="pearson")

这显然马虎,不起作用,所以我有点卡住了.预先感谢您的帮助!

This is obviously sloppy and doesn't work, so I'm a little stuck. Thanks in advance for your help!

我的数据格式如下:

HML组RML FML TML FHD BIB

Group HML RML FML TML FHD BIB

 1      323.50    248.75     434.50    355.75    46.84    NA

 2        NA      238.50     441.50    353.00    45.83    277.0

 2      309.50    227.75     419.00    332.25    46.39    284.0

推荐答案

如果您提供数据示例(或全部),则效果会更好,但由于您没有提供,

This would be much better if you provided an example of your data (or all of it), but since you didn't,

# create sample data
set.seed(1)
MyMatrix <- data.frame(group=rep(1:5, each=100),matrix(rnorm(2500),ncol=5))
# generate list of covariance matrices by group
cov.list <- lapply(unique(MyMatrix$group),
                   function(x)cov(MyMatrix[MyMatrix$group==x,-1],
                                  use="na.or.complete"))
cov.list[1]
# [[1]]
#             X1          X2          X3          X4          X5
# X1  0.80676209 -0.09541458 -0.12704666 -0.04122976  0.08636307
# X2 -0.09541458  0.93350463 -0.05197573 -0.06457299 -0.02203141
# X3 -0.12704666 -0.05197573  1.06030090  0.07324986  0.01840894
# X4 -0.04122976 -0.06457299  0.07324986  1.12059428  0.02385031
# X5  0.08636307 -0.02203141  0.01840894  0.02385031  1.11101410

在此示例中,我们创建一个具有六列的名为MyMatrix的数据框.第一个是group,其他五个是X1, X2, ... X5,其中包含我们希望关联的数据.希望这类似于您的数据集的结构.

In this example we create a dataframe called MyMatrix with a six columns. The first is group and the other five are X1, X2, ... X5 and contain the data we wish to correlate. Hopefully, this is similar to the structure of your dataset.

有效代码行是:

cov.list <- lapply(unique(MyMatrix$group),
                   function(x)cov(MyMatrix[MyMatrix$group==x,-1],
                                  use="na.or.complete"))

这将获取组ID的列表(来自unique(MyMatrix$group)),并使用每个ID进行调用.该函数为MyMatrix的所有列(除第一列之外)的所有列(相关组中的所有行)计算协方差矩阵,并将结果存储在5个元素的列表中(此示例中有5个组).

This takes a list of group id's (from unique(MyMatrix$group)) and calls the function with each of them. The function calculates the covariance matrix for all columns of MyMatrix except the first, for all rows in the relevant group, and stores the results in a 5-element list (there are 5 groups in this example).

注意:关于如何处理NA.实际上有几种选择.您应该查看?cov上的文档以了解它们的含义.此处选择的方法use="na.or.complete"在计算中仅包括在任何列中都没有 NA值的行.如果对于给定的组,没有这样的行,则cov(...)返回NA.不过,还有其他几种选择.

Note: Regarding how to deal with NA. There are actually several options; you should review the documentation on ?cov to see what they are. The method chosen here, use="na.or.complete" includes in the calculation only rows which have no NA values in any of the columns. If, for a given group, there are no such rows, cov(...) returns NA. There are several other choices though.

这篇关于组协方差矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆