用dplyr联接操作替换数据框的子集 [英] Replace a subset of a data frame with dplyr join operations

查看:75
本文介绍了用dplyr联接操作替换数据框的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我对数据框的某些列值进行了处理,如下所示:

Suppose that I gave a treatment to some column values of a data frame like this:

  id animal weight   height ...
  1    dog     23.0
  2    cat     NA
  3   duck     1.2
  4  fairy     0.2
  5  snake     BAD


df <- data.frame(id = seq(1:5),
             animal = c("dog", "cat", "duck", "fairy", "snake"),
             weight = c("23", NA, "1.2", "0.2",  "BAD"))

假设处理需要在单独的表中工作,并给出以下数据框,该数据框是原始数据的子集:

Suppose that the treatment require to work in a separately table, and gave as the result, the following data frame that is a subset of the original:

  id animal weight
  2    cat    2.2
  5  snake    1.3

sub_df <- data.frame(id = c(2, 5),
             animal = c("cat", "snake"),
             weight = c("2.2", "1.3"))

现在我想再次将它们放在一起,所以我使用这样的操作:

Now I want to put all together again, so I use an operation like this:

> df %>%
   anti_join(sub_df, by = c("id", "animal")) %>%
   bind_rows(sub_df)

 id animal weight
 4  fairy    0.2
 1    dog   23.0
 3   duck    1.2
 2    cat    2.2
 5  snake    1.3

是否存在某种直接通过联接操作执行此操作的方法?

子集只是要进行治疗的关键列和变量对象 (id,动物体重) ,而不是原始变量的总变量数据框(id,动物,体重,身高)如何将子集与原始集合组合?

推荐答案

您所描述的是联接操作,您可以在其中更新原始数据集中的某些值。使用 data.table 可以很容易地实现出色的性能,因为它具有快速联接和按引用更新的概念(:= )。

What you describe is a join operation in which you update some values in the original dataset. This is very easy to do with great performance using data.table because of its fast joins and update-by-reference concept (:=).

以下是您的玩具数据示例:

Here's an example for your toy data:

library(data.table)
setDT(df)             # convert to data.table without copy
setDT(sub_df)         # convert to data.table without copy

# join and update "df" by reference, i.e. without copy 
df[sub_df, on = c("id", "animal"), weight := i.weight]

数据现在已更新:

#   id animal weight
#1:  1    dog   23.0
#2:  2    cat    2.2
#3:  3   duck    1.2
#4:  4  fairy    0.2
#5:  5  snake    1.3

您可以使用 setDF 切换回普通的 data.frame

You can use setDF to switch back to ordinary data.frame.

这篇关于用dplyr联接操作替换数据框的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆