处理缺失值以进行相关性计算 [英] Dealing with missing values for correlations calculation

查看:399
本文介绍了处理缺失值以进行相关性计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的矩阵很大,缺少很多值.我想获取变量之间的相关性.

I have huge matrix with a lot of missing values. I want to get the correlation between variables.

1.是解决方案

cor(na.omit(matrix))

比下面更好?

cor(matrix, use = "pairwise.complete.obs")

我已经只选择了缺失值超过20%的变量.

I already have selected only variables having more than 20% of missing values.

2..哪种方法最有意义?

2. Which is the best method to make sense ?

推荐答案

我会投票赞成第二个选项.听起来您丢失了大量数据,因此您将寻找一种明智的多重插补策略来填充空白.有关如何正确执行此操作"的大量指导,请参见Harrell的文章回归建模策略".

I would vote for the second option. Sounds like you have a fair amount of missing data and so you would be looking for a sensible multiple imputation strategy to fill in the spaces. See Harrell's text "Regression Modeling Strategies" for a wealth of guidance on 'how's to do this properly.

这篇关于处理缺失值以进行相关性计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆