用 dplyr 连接操作替换数据框的子集 [英] Replace a subset of a data frame with dplyr join operations
问题描述
假设我对数据框的某些列值进行了如下处理:
Suppose that I gave a treatment to some column values of a data frame like this:
id animal weight height ...
1 dog 23.0
2 cat NA
3 duck 1.2
4 fairy 0.2
5 snake BAD
df <- data.frame(id = seq(1:5),
animal = c("dog", "cat", "duck", "fairy", "snake"),
weight = c("23", NA, "1.2", "0.2", "BAD"))
假设处理需要在一个单独的表中工作,并给出了作为原始数据子集的以下数据框:
Suppose that the treatment require to work in a separately table, and gave as the result, the following data frame that is a subset of the original:
id animal weight
2 cat 2.2
5 snake 1.3
sub_df <- data.frame(id = c(2, 5),
animal = c("cat", "snake"),
weight = c("2.2", "1.3"))
现在我想把所有东西重新组合起来,所以我使用了这样的操作:
Now I want to put all together again, so I use an operation like this:
> df %>%
anti_join(sub_df, by = c("id", "animal")) %>%
bind_rows(sub_df)
id animal weight
4 fairy 0.2
1 dog 23.0
3 duck 1.2
2 cat 2.2
5 snake 1.3
是否有某种方法可以直接使用连接操作执行此操作?
如果子集只是关键列和变量主题进行处理 (id,动物体重) 而不是总变量原始数据框(id,animal, weight, height),如何将子集与原始集组装在一起?
推荐答案
您所描述的是一个连接操作,您可以在其中更新原始数据集中的某些值.使用 data.table
可以很容易地以出色的性能实现这一点,因为它具有快速连接和按引用更新的概念 (:=
).
What you describe is a join operation in which you update some values in the original dataset. This is very easy to do with great performance using data.table
because of its fast joins and update-by-reference concept (:=
).
以下是您的玩具数据示例:
Here's an example for your toy data:
library(data.table)
setDT(df) # convert to data.table without copy
setDT(sub_df) # convert to data.table without copy
# join and update "df" by reference, i.e. without copy
df[sub_df, on = c("id", "animal"), weight := i.weight]
数据现已更新:
# id animal weight
#1: 1 dog 23.0
#2: 2 cat 2.2
#3: 3 duck 1.2
#4: 4 fairy 0.2
#5: 5 snake 1.3
可以使用setDF
切换回普通的data.frame
.
You can use setDF
to switch back to ordinary data.frame
.
这篇关于用 dplyr 连接操作替换数据框的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!