合并数据框中的行,其中行不相交并包含 NA [英] Merge rows in a dataframe where the rows are disjoint and contain NAs

查看:17
本文介绍了合并数据框中的行,其中行不相交并包含 NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两行的数据框:

<代码>|代码 |姓名 |v1 |v2 |v3 |v4 ||------|-------|----|----|----|----||第345话也门 |不适用 |2 |3 |不适用 ||第346话也门 |4 |不适用 |不适用 |5 |

是否有一种简单的方法可以合并这两行?如果我在346"中重命名345",会不会让事情变得更简单?

解决方案

您可以使用 聚合.假设您要合并列 name 中具有相同值的行:

aggregate(x=DF[c("v1","v2","v3","v4")], by=list(name=DF$name), min, na.rm = TRUE)名称 v1 v2 v3 v41 也门 4 2 3 5

这就像 SQL SELECT name, min(v1) GROUP BY name.min 函数是任意的,您也可以使用 maxmean,它们都从 NA 和非-NA 值如果 na.rm = TRUE.(如果 R 中存在类似 SQL 的 coalesce() 函数听起来会更好.)

但是,您应该首先检查给定 name 的所有非 NA 值是否相同.例如,同时使用 minmax 运行 aggregate 并进行比较,或者使用 range 运行它.

最后,如果你有比 v1-4 更多的变量,你可以使用 DF[,!(names(DF) %in% c("code","name"))] 定义列.

I have a dataframe that has two rows:

| code | name  | v1 | v2 | v3 | v4 |
|------|-------|----|----|----|----|
| 345  | Yemen | NA | 2  | 3  | NA |
| 346  | Yemen | 4  | NA | NA | 5  |

Is there an easy way to merge these two rows? What if I rename "345" in "346", would that make things easier?

解决方案

You can use aggregate. Assuming that you want to merge rows with identical values in column name:

aggregate(x=DF[c("v1","v2","v3","v4")], by=list(name=DF$name), min, na.rm = TRUE)
   name v1 v2 v3 v4
1 Yemen  4  2  3  5

This is like the SQL SELECT name, min(v1) GROUP BY name. The min function is arbitrary, you could also use max or mean, all of them return the non-NA value from an NA and a non-NA value if na.rm = TRUE. (An SQL-like coalesce() function would sound better if existed in R.)

However, you should check first if all non-NA values for a given name is identical. For example, run the aggregate both with min and max and compare, or run it with range.

Finally, if you have many more variables than just v1-4, you could use DF[,!(names(DF) %in% c("code","name"))] to define the columns.

这篇关于合并数据框中的行,其中行不相交并包含 NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆