用dplyr联接操作替换数据框的子集 [英] Replace a subset of a data frame with dplyr join operations
问题描述
假设我对数据框的某些列值进行了处理,如下所示:
Suppose that I gave a treatment to some column values of a data frame like this:
id animal weight height ...
1 dog 23.0
2 cat NA
3 duck 1.2
4 fairy 0.2
5 snake BAD
df <- data.frame(id = seq(1:5),
animal = c("dog", "cat", "duck", "fairy", "snake"),
weight = c("23", NA, "1.2", "0.2", "BAD"))
假设处理需要在单独的表中工作,并给出以下数据框,该数据框是原始数据的子集:
Suppose that the treatment require to work in a separately table, and gave as the result, the following data frame that is a subset of the original:
id animal weight
2 cat 2.2
5 snake 1.3
sub_df <- data.frame(id = c(2, 5),
animal = c("cat", "snake"),
weight = c("2.2", "1.3"))
现在我想再次将它们放在一起,所以我使用这样的操作:
Now I want to put all together again, so I use an operation like this:
> df %>%
anti_join(sub_df, by = c("id", "animal")) %>%
bind_rows(sub_df)
id animal weight
4 fairy 0.2
1 dog 23.0
3 duck 1.2
2 cat 2.2
5 snake 1.3
是否存在某种直接通过联接操作执行此操作的方法?
子集只是要进行治疗的关键列和变量对象 (id,动物体重) ,而不是原始变量的总变量数据框(id,动物,体重,身高),如何将子集与原始集合组合?
推荐答案
您所描述的是联接操作,您可以在其中更新原始数据集中的某些值。使用 data.table
可以很容易地实现出色的性能,因为它具有快速联接和按引用更新的概念(:=
)。
What you describe is a join operation in which you update some values in the original dataset. This is very easy to do with great performance using data.table
because of its fast joins and update-by-reference concept (:=
).
以下是您的玩具数据示例:
Here's an example for your toy data:
library(data.table)
setDT(df) # convert to data.table without copy
setDT(sub_df) # convert to data.table without copy
# join and update "df" by reference, i.e. without copy
df[sub_df, on = c("id", "animal"), weight := i.weight]
数据现在已更新:
# id animal weight
#1: 1 dog 23.0
#2: 2 cat 2.2
#3: 3 duck 1.2
#4: 4 fairy 0.2
#5: 5 snake 1.3
您可以使用 setDF
切换回普通的 data.frame
。
You can use setDF
to switch back to ordinary data.frame
.
这篇关于用dplyr联接操作替换数据框的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!