按组合并行,每行具有不同的NA [英] Combine rows by group with differing NAs in each row

查看:73
本文介绍了按组合并行,每行具有不同的NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我找不到这个问题的确切答案,所以希望我不要重复一个问题.

I can't find an exact answer to this problem, so I hope I'm not duplicating a question.

我有一个如下数据框

groupid  col1  col2  col3  col4
   1      0     n     NA     2    
   1      NA    NA    2      2

我要传达的是,存在重复的ID,其中总信息分布在两行中,我想将这些行合并以将所有信息合并为一行.我该怎么办?

What I'm trying to convey with this is that there are duplicate IDs where the total information is spread across both rows and I want to combine these rows to get all the information into one row. How do I go about this?

我尝试使用group_by并粘贴,但这最终使数据混乱(例如,在col4中获取22而不是2)和sum()无法正常工作,因为有些列是字符串,而有些列不是字符串是分类变量,将它们相加会更改信息.

I've tried to play around with group_by and paste but that ends up making the data messier (getting 22 instead of 2 in col4 for example) and sum() does not work because some columns are strings and those that are not are categorical variables and summing them would change the information.

在填写NA时,是否可以做些折叠行并保持一致数据不变的事情?

Is there something I can do to collapse the rows and leave consistent data unchanged while filling in NAs?

对不起,期望的输出如下:

Sorry desired output is as follows:

groupid  col1  col2  col3  col4
   1      0     n     2     2

推荐答案

这是您想要的吗? zoo + dplyr还要检查

Is this what you want ? zoo+dplyr also check the link here

df %>%
    group_by(groupid) %>%
    mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))%>%filter(row_number()==n())


# A tibble: 1 x 5
# Groups:   groupid [1]
  groupid  col1  col2  col3  col4
    <int> <int> <chr> <int> <int>
1       1     0     n     2     2

EDIT1

不使用过滤器,将返回整个数据帧.

without the filter , will give back whole dataframe.

    df %>%
        group_by(groupid) %>%
        mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))

# A tibble: 2 x 5
# Groups:   groupid [1]
  groupid  col1  col2  col3  col4
    <int> <int> <chr> <int> <int>
1       1     0     n    NA     2
2       1     0     n     2     2

filter在这里,只切最后一个,na.locf将保留前一个而不是NA的值,这意味着组中的最后一行是您想要的.

filter here, just slice the last one, na.locf will carry on the previous not NA value, which mean the last row in your group is what you want.

也基于@thelatemail推荐.您可以执行以下操作,返回相同的答案.

Also base on @ thelatemail recommended. you can do the following , give back the same answer.

df %>% group_by(groupid) %>% summarise_all(funs(.[!is.na(.)][1]))

EDIT2

假设您有冲突,并且想要全部显示.

Assuming you have conflict and you want to show them all.

df <- read.table(text="groupid  col1  col2  col3  col4
   1      0     n     NA     2    
                 1      1    NA    2      2",
                 header=TRUE,stringsAsFactors=FALSE)
 df
  groupid col1 col2 col3 col4
1       1    0    n   NA    2
2       1    1(#)<NA>    2    2(#)
df %>%
    group_by(groupid) %>%
    summarise_all(funs(toString(unique(na.omit(.)))))#unique for duplicated like col4
  groupid  col1  col2  col3  col4
    <int> <chr> <chr> <chr> <chr>
1       1  0, 1     n     2   2

这篇关于按组合并行,每行具有不同的NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆