用 dplyr 连接操作替换数据框的子集 [英] Replace a subset of a data frame with dplyr join operations

查看:30
本文介绍了用 dplyr 连接操作替换数据框的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我对数据框的某些列值进行了如下处理:

Suppose that I gave a treatment to some column values of a data frame like this:

  id animal weight   height ...
  1    dog     23.0
  2    cat     NA
  3   duck     1.2
  4  fairy     0.2
  5  snake     BAD


df <- data.frame(id = seq(1:5),
             animal = c("dog", "cat", "duck", "fairy", "snake"),
             weight = c("23", NA, "1.2", "0.2",  "BAD"))

假设处理需要在一个单独的表中工作,并给出了作为原始数据子集的以下数据框:

Suppose that the treatment require to work in a separately table, and gave as the result, the following data frame that is a subset of the original:

  id animal weight
  2    cat    2.2
  5  snake    1.3

sub_df <- data.frame(id = c(2, 5),
             animal = c("cat", "snake"),
             weight = c("2.2", "1.3"))

现在我想把所有东西重新组合起来,所以我使用了这样的操作:

Now I want to put all together again, so I use an operation like this:

> df %>%
   anti_join(sub_df, by = c("id", "animal")) %>%
   bind_rows(sub_df)

 id animal weight
 4  fairy    0.2
 1    dog   23.0
 3   duck    1.2
 2    cat    2.2
 5  snake    1.3

是否有某种方法可以直接使用连接操作执行此操作?

如果子集只是关键列和变量主题进行处理 (id,动物体重) 而不是总变量原始数据框(id,animal, weight, height)如何将子集与原始集组装在一起?

推荐答案

您所描述的是一个连接操作,您可以在其中更新原始数据集中的某些值.使用 data.table 可以很容易地以出色的性能实现这一点,因为它具有快速连接和按引用更新的概念 (:=).

What you describe is a join operation in which you update some values in the original dataset. This is very easy to do with great performance using data.table because of its fast joins and update-by-reference concept (:=).

以下是您的玩具数据示例:

Here's an example for your toy data:

library(data.table)
setDT(df)             # convert to data.table without copy
setDT(sub_df)         # convert to data.table without copy

# join and update "df" by reference, i.e. without copy 
df[sub_df, on = c("id", "animal"), weight := i.weight]

数据现已更新:

#   id animal weight
#1:  1    dog   23.0
#2:  2    cat    2.2
#3:  3   duck    1.2
#4:  4  fairy    0.2
#5:  5  snake    1.3

可以使用setDF切换回普通的data.frame.

You can use setDF to switch back to ordinary data.frame.

这篇关于用 dplyr 连接操作替换数据框的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆