如何将一个数据框中的新行绑定到 R 中的现有数据框中 [英] How to rbind new rows from one data frame to an existing data frame in R
问题描述
我想知道如何根据每个表中的唯一值将新数据(行)从一个数据框 df2 附加到现有数据框 df1.所以我有一个现有的数据框 df1,它有历史数据,每一行都有一个唯一的值.然后我从网络中提取数据并将其放入一个新的数据框 df2 中.新数据框还包含一个唯一值,该值可能与 df1 中的唯一值匹配,也可能不匹配.
I would like to know how to append new data (rows) from one data frame, df2, to an existing data frame, df1, based on a unique value in each table. So I have an existing data frame, df1, that has historical data and each row has a unique value. I then pull data from the web and put it into a new data frame, df2. The new data frame also includes a unique value which may or may not match a unique value in df1.
我想获取 df2 中所有具有 df1 中不存在的唯一值的行,并将这些行附加到 df1.我最初的想法是使用类似这样的代码:
I would like to take all rows in df2 that have a unique value that does not exist in df1, and append those rows to df1. My initial thoughts were to use code similar to this:
ifelse(any(df1$unique_val==df2$unique_val), df1 <- df1, df1 <- rbind(df2, df1))
但后来我意识到我需要比任何"匹配更多的一对一匹配.我知道如何在带有 UNION 和 WHERE 子句的 SQL 中执行此操作,但我不确定如何使其在 R 中工作.我可以找到研究的唯一相关项目是附加来自两个数据框的所有数据或附加一个新的列到现有数据框.
But then I realized that I need a more one-to-one match than an "any" match. I know how I would do this in SQL with a UNION and WHERE clause, but I'm not sure how to make it work in R. The only related items I could find researching were appending all data from two data frames or appending a new column to an existing data frame.
以下示例显示了我正在寻找的内容以及为什么我不希望加入"这两个数据框"
The following example shows what I am looking for and why I am not looking to "join" these two data frames"
df1 = data.frame(numb = c(1:6), rand = c(rep("Toaster",6)))
df1$unique_val <- paste0(df1$numb, df1$rand)
<代码>>df1麻木 rand unique_val1 1 烤面包机 1 烤面包机2 2 烤面包机 2 烤面包机3 3 烤面包机 3 烤面包机4 4 烤面包机 4 烤面包机5 5 烤面包机 5 烤面包机6 6 烤面包机 6 烤面包机
df2 = data.frame(numb = c(5:7), rand = c(rep("Toaster",2), c(rep("Radio",1))))
df2$unique_val <- paste0(df2$numb, df2$rand)
<代码>>df2麻木 rand unique_val1 5 烤面包机 5 烤面包机2 6 烤面包机 6 烤面包机3 7 无线电 7 无线电
如您所见,df2 中的第 3 行是唯一的新行(在 df1 中没有匹配的 unique_val 的行).我想将此新行添加到 df1.注意:df2 中的新行并不总是相同的行.
As you can see, row 3 in df2 is the only new row (a row that does not have a matching unique_val in df1). I would like to add this new row to df1. Note: it's not always the same row that is new in df2.
我使用了这篇文章中的每个连接,合并/加入数据框如下:
I used each of the joins from this post, merge/join data frames as follows:
merge(df1,df2, by = "unique_val")
merge(df1,df2, by = "unique_val", all = TRUE)
merge(df1,df2, by = "unique_val", all.x = TRUE)
merge(df1,df2, by = "unique_val", all.y = TRUE)
我还尝试了 dplyr 的 anti_join:
I also tried the anti_join from dplyr:
anti_join(df1,df2, by = "unique_val")
Rbind 给了我以下内容:
Rbind gives me the following:
rbind(df1,df2)麻木的浓度1 1 烤面包机 1 烤面包机2 2 烤面包机 2 烤面包机3 3 烤面包机 3 烤面包机4 4 烤面包机 4 烤面包机5 5 烤面包机 5 烤面包机6 6 烤面包机 6 烤面包机7 5 烤面包机 5 烤面包机8 6 烤面包机 6 烤面包机9 7 无线电 7 无线电
这些都没有给我以下所需的输出:
None of which give me the desired output of the following:
<代码>麻木的浓度1 1 烤面包机 1 烤面包机2 2 烤面包机 2 烤面包机3 3 烤面包机 3 烤面包机4 4 烤面包机 4 烤面包机5 5 烤面包机 5 烤面包机6 6 烤面包机 6 烤面包机7 7 无线电 7 无线电
我希望绑定这些数据框,而不是加入它们.
I'm looking to rbind these data frames, not join them.
推荐答案
我们可以使用 data.table
中的 rbindlist/unique
.我们将数据集放在list
中,使用rbindlist
(来自data.table
)将list
中的数据集rbind> 到单个 data.table
并从 data.table
获取带有 unique
的 unique
行,它也有by
选项来指定变量.
We can use rbindlist/unique
from data.table
. We place the datasets in a list
, use rbindlist
(from data.table
) to rbind the datasets in the list
to a single data.table
and get the unique
rows with unique
from data.table
which also has the by
option to specify the variable.
library(data.table)
unique(rbindlist(list(df1, df2)), by = "numb")
# numb rand unique_val
#1: 1 Toaster 1Toaster
#2: 2 Toaster 2Toaster
#3: 3 Toaster 3Toaster
#4: 4 Toaster 4Toaster
#5: 5 Toaster 5Toaster
#6: 6 Toaster 6Toaster
#7: 7 Radio 7Radio
这篇关于如何将一个数据框中的新行绑定到 R 中的现有数据框中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!