折叠行,其中一些都是 NA,其他与一些 NA 不相交 [英] Collapsing rows where some are all NA, others are disjoint with some NAs

查看:30
本文介绍了折叠行,其中一些都是 NA,其他与一些 NA 不相交的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的数据框:

I have a simple dataframe as such:

ID    Col1    Col2    Col3    Col4
1     NA      NA      NA      NA  
1     5       10      NA      NA
1     NA      NA      15      20
2     NA      NA      NA      NA  
2     25      30      NA      NA
2     NA      NA      35      40 

我想重新格式化它:

ID    Col1    Col2    Col3    Col4
1     5       10      15      20
2     25      30      35      40

(请注意:真实数据集有数千行,值来自生物数据——NA 没有遵循简单的模式,除了NAs 是不相交的,是的,每个 ID 正好有 3 行).

(please note: the real data set has thousands of rows and the values are from biological data -- the NAs follow no simple pattern, except that the NAs are disjoint, and yes there are exactly 3 rows for each ID).

第一步:删除只有 NA 值的行.

STEP ONE: get rid of rows that have only NA values.

表面上这看起来很简单,但我遇到了一些问题.

On the surface this looked simple, but I've run across some problems.

complete.cases(DF) 返回所有 FALSE,所以我真的不能用它来删除所有 NA 的行,如DF[complete.cases(DF),].这是因为所有行都包含至少一个 NA.

complete.cases(DF) returns all FALSE, so I can't really use this to remove the rows with all NAs, as in DF[complete.cases(DF),]. This is because all rows contain at least one NA.

由于 NA 想要自我传播,其他使用 is.na 的方案也出于同样的原因失败.

Since NAs want to propagate themselves, other schemes using is.na fail for the same reason.

第二步:将剩余的两行折叠为一.

STEP TWO: collapse the remaining two rows into one.

考虑使用诸如 aggregate 之类的东西来实现这一点,但必须有比 this,根本不起作用.

Thinking about using something like aggregate to pull this off, but there has got to be an easier way than this, which doesn't work at all.

感谢您的建议.

推荐答案

尝试

library(dplyr)
DF %>% group_by(ID) %>% summarise_each(funs(sum(., na.rm = TRUE))) 

<小时>

为了说明一列具有特定ID的所有NAs的情况,我们需要sum_NA() 函数返回 NA 如果都是 NAs


To account for the case in which one column has all NAs for a certain ID, we need sum_NA() function which returns NA if all are NAs

txt <- "ID    Col1    Col2    Col3    Col4
        1     NA      NA      NA      NA
        1     5       10      NA      NA
        1     NA      NA      15      20
        2     NA      NA      NA      NA
        2     NA      30      NA      NA
        2     NA      NA      35      40"
DF <- read.table(text = txt, header = TRUE)

# original code
DF %>% 
  group_by(ID) %>% 
  summarise_each(funs(sum(., na.rm = TRUE)))

# `summarise_each()` is deprecated.
# Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.
# To map `funs` over all variables, use `summarise_all()`
# A tibble: 2 x 5
     ID  Col1  Col2  Col3  Col4
  <int> <int> <int> <int> <int>
1     1     5    10    15    20
2     2     0    30    35    40

sum_NA <- function(x) {if (all(is.na(x))) x[NA_integer_] else sum(x, na.rm = TRUE)}

DF %>%
  group_by(ID) %>%
  summarise_all(funs(sum_NA))

DF %>%
  group_by(ID) %>%
  summarise_if(is.numeric, funs(sum_NA))

# A tibble: 2 x 5
     ID  Col1  Col2  Col3  Col4
  <int> <int> <int> <int> <int>
1     1     5    10    15    20
2     2    NA    30    35    40

这篇关于折叠行,其中一些都是 NA,其他与一些 NA 不相交的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆