根据另一个数据框替换特定值 [英] Replace specific values based on another dataframe
问题描述
首先,让我们从DataFrame 1(DF1)开始:
First, let's start with DataFrame 1 (DF1) :
DF1 <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
"06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
"06/22/2016", "06/23/2016"),
c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
c(149, 150, 151, 152, 155, 84, 83, 80, 81, 97),
c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
c("MTL", "MTL", "MTL", "MTL", "MTL", "NY", "NY",
"NY", "NY", "NY"))
colnames(DF1) <- c("date", "id", "sales", "cost", "city")
我也有DataFrame 2(DF2):
I also have DataFrame 2 (DF2) :
DF2 <- data.frame(c("06/19/2016", "06/27/2016", "06/22/2016", "06/23/2016"),
c(1, 1, 2, 2),
c(9999, 8888, 777, 555),
c("LON", "LON", "QC", "QC"))
colnames(DF2) <- c("date", "id", "sales", "city")
对于DF1中的每一行,我必须查看DF2中是否存在具有相同日期和ID的行.如果是,我必须用DF2中的值替换DF1中的值.
For every rows in DF1, I have to look if there is a row in DF2 that has the same date and id. If yes, I have to replace the values in DF1 by the values in DF2.
DF2的列将总是少于DF1.如果某列不在DF2中,则必须保留该特定列在DF1中的原始值.
DF2 will always have less columns than DF1. If a column is not in DF2, I must keep the original value that was in DF1 for that specific column.
最终输出如下:
results <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
"06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
"06/22/2016", "06/23/2016"),
c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
c(9999, 150, 151, 152, 155, 84, 83, 80, 777, 555),
c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
c("LON", "MTL", "MTL", "MTL", "MTL", "NY", "NY",
"NY", "QC", "QC"))
colnames(results) <- c("date", "id", "sales", "cost", "city")
您有什么建议吗?
推荐答案
您可以使用 data.table -为此:
library(data.table)
setDT(DF1)
setDT(DF2)
DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]
给出:
> DF1
date id sales cost city
1: 06/19/2016 1 9999 101 LON
2: 06/20/2016 1 150 102 MTL
3: 06/21/2016 1 151 104 MTL
4: 06/22/2016 1 152 107 MTL
5: 06/23/2016 1 155 99 MTL
6: 06/19/2016 2 84 55 NY
7: 06/20/2016 2 83 55 NY
8: 06/21/2016 2 80 56 NY
9: 06/22/2016 2 777 57 QC
10: 06/23/2016 2 555 58 QC
当两个数据集中都有很多列时,使用mget
较容易,而不必键入所有列名.对于问题中使用的数据,它看起来像:
When you have many columns in both datasets, it is easier to use mget
instead off typing all the column names. For the used data in the question it would look like:
DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]
当您要构造一个需要预先添加的列名向量时,可以按照以下步骤进行操作:
When you want to construct a vector of columnnames that need to be added beforehand, you could do this as follows:
cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]
这篇关于根据另一个数据框替换特定值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!