合并数据框中的行,其中行不相交并包含 NA [英] Merge rows in a dataframe where the rows are disjoint and contain NAs
问题描述
我有一个包含两行的数据框:
<代码>|代码 |姓名 |v1 |v2 |v3 |v4 ||------|-------|----|----|----|----||第345话也门 |不适用 |2 |3 |不适用 ||第346话也门 |4 |不适用 |不适用 |5 |
是否有一种简单的方法可以合并这两行?如果我在346"中重命名345",会不会让事情变得更简单?
您可以使用 聚合
.假设您要合并列 name
中具有相同值的行:
aggregate(x=DF[c("v1","v2","v3","v4")], by=list(name=DF$name), min, na.rm = TRUE)名称 v1 v2 v3 v41 也门 4 2 3 5
这就像 SQL SELECT name, min(v1) GROUP BY name
.min
函数是任意的,您也可以使用 max
或 mean
,它们都从 NA 和非-NA 值如果 na.rm = TRUE
.(如果 R 中存在类似 SQL 的 coalesce()
函数听起来会更好.)
但是,您应该首先检查给定 name
的所有非 NA 值是否相同.例如,同时使用 min
和 max
运行 aggregate
并进行比较,或者使用 range
运行它.>
最后,如果你有比 v1-4 更多的变量,你可以使用 DF[,!(names(DF) %in% c("code","name"))]
定义列.
I have a dataframe that has two rows:
| code | name | v1 | v2 | v3 | v4 |
|------|-------|----|----|----|----|
| 345 | Yemen | NA | 2 | 3 | NA |
| 346 | Yemen | 4 | NA | NA | 5 |
Is there an easy way to merge these two rows? What if I rename "345" in "346", would that make things easier?
You can use aggregate
. Assuming that you want to merge rows with identical values in column name
:
aggregate(x=DF[c("v1","v2","v3","v4")], by=list(name=DF$name), min, na.rm = TRUE)
name v1 v2 v3 v4
1 Yemen 4 2 3 5
This is like the SQL SELECT name, min(v1) GROUP BY name
. The min
function is arbitrary, you could also use max
or mean
, all of them return the non-NA value from an NA and a non-NA value if na.rm = TRUE
.
(An SQL-like coalesce()
function would sound better if existed in R.)
However, you should check first if all non-NA values for a given name
is identical. For example, run the aggregate
both with min
and max
and compare, or run it with range
.
Finally, if you have many more variables than just v1-4, you could use DF[,!(names(DF) %in% c("code","name"))]
to define the columns.
这篇关于合并数据框中的行,其中行不相交并包含 NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!