按组合并行,每行具有不同的NA [英] Combine rows by group with differing NAs in each row
问题描述
我找不到这个问题的确切答案,所以希望我不要重复一个问题.
I can't find an exact answer to this problem, so I hope I'm not duplicating a question.
我有一个如下数据框
groupid col1 col2 col3 col4
1 0 n NA 2
1 NA NA 2 2
我要传达的是,存在重复的ID,其中总信息分布在两行中,我想将这些行合并以将所有信息合并为一行.我该怎么办?
What I'm trying to convey with this is that there are duplicate IDs where the total information is spread across both rows and I want to combine these rows to get all the information into one row. How do I go about this?
我尝试使用group_by并粘贴,但这最终使数据混乱(例如,在col4中获取22而不是2)和sum()无法正常工作,因为有些列是字符串,而有些列不是字符串是分类变量,将它们相加会更改信息.
I've tried to play around with group_by and paste but that ends up making the data messier (getting 22 instead of 2 in col4 for example) and sum() does not work because some columns are strings and those that are not are categorical variables and summing them would change the information.
在填写NA时,是否可以做些折叠行并保持一致数据不变的事情?
Is there something I can do to collapse the rows and leave consistent data unchanged while filling in NAs?
对不起,期望的输出如下:
Sorry desired output is as follows:
groupid col1 col2 col3 col4
1 0 n 2 2
推荐答案
Is this what you want ? zoo
+dplyr
also check the link here
df %>%
group_by(groupid) %>%
mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))%>%filter(row_number()==n())
# A tibble: 1 x 5
# Groups: groupid [1]
groupid col1 col2 col3 col4
<int> <int> <chr> <int> <int>
1 1 0 n 2 2
EDIT1
不使用过滤器,将返回整个数据帧.
without the filter , will give back whole dataframe.
df %>%
group_by(groupid) %>%
mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))
# A tibble: 2 x 5
# Groups: groupid [1]
groupid col1 col2 col3 col4
<int> <int> <chr> <int> <int>
1 1 0 n NA 2
2 1 0 n 2 2
filter
在这里,只切最后一个,na.locf
将保留前一个而不是NA
的值,这意味着组中的最后一行是您想要的.
filter
here, just slice the last one, na.locf
will carry on the previous not NA
value, which mean the last row in your group is what you want.
也基于@thelatemail推荐.您可以执行以下操作,返回相同的答案.
Also base on @ thelatemail recommended. you can do the following , give back the same answer.
df %>% group_by(groupid) %>% summarise_all(funs(.[!is.na(.)][1]))
EDIT2
假设您有冲突,并且想要全部显示.
Assuming you have conflict and you want to show them all.
df <- read.table(text="groupid col1 col2 col3 col4
1 0 n NA 2
1 1 NA 2 2",
header=TRUE,stringsAsFactors=FALSE)
df
groupid col1 col2 col3 col4
1 1 0 n NA 2
2 1 1(#)<NA> 2 2(#)
df %>%
group_by(groupid) %>%
summarise_all(funs(toString(unique(na.omit(.)))))#unique for duplicated like col4
groupid col1 col2 col3 col4
<int> <chr> <chr> <chr> <chr>
1 1 0, 1 n 2 2
这篇关于按组合并行,每行具有不同的NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!