如何从R中的一个数据帧到现有数据帧重新绑定新行 [英] How to rbind new rows from one data frame to an existing data frame in R
问题描述
我想知道如何基于每个表中的唯一值将新数据(行)从一个数据帧df2附加到现有数据帧df1.因此,我有一个现有的数据框df1,其中包含历史数据,并且每一行都有一个唯一值.然后,我从Web提取数据,并将其放入新的数据帧df2中.新数据帧还包含一个唯一值,该值可能与df1中的唯一值匹配,也可能不匹配.
I would like to know how to append new data (rows) from one data frame, df2, to an existing data frame, df1, based on a unique value in each table. So I have an existing data frame, df1, that has historical data and each row has a unique value. I then pull data from the web and put it into a new data frame, df2. The new data frame also includes a unique value which may or may not match a unique value in df1.
我想获取df2中所有具有唯一值(在df1中不存在)的行,并将这些行附加到df1中.我最初的想法是使用与此类似的代码:
I would like to take all rows in df2 that have a unique value that does not exist in df1, and append those rows to df1. My initial thoughts were to use code similar to this:
ifelse(any(df1 $ unique_val == df2 $ unique_val),df1 <-df1,df1 <-rbind(df2,df1))
但是后来我意识到,与任何"比赛相比,我需要更多的一对一比赛.我知道如何在SQL中使用UNION和WHERE子句来执行此操作,但是我不确定如何使其在R中工作.我可以找到的唯一相关研究项目是将所有数据添加到两个数据帧中,或者添加一个新的列到现有数据框.
But then I realized that I need a more one-to-one match than an "any" match. I know how I would do this in SQL with a UNION and WHERE clause, but I'm not sure how to make it work in R. The only related items I could find researching were appending all data from two data frames or appending a new column to an existing data frame.
以下示例显示了我正在寻找的内容以及为什么我不希望联接"这两个数据帧"
The following example shows what I am looking for and why I am not looking to "join" these two data frames"
df1 = data.frame(numb = c(1:6),rand = c(rep("Toaster",6)))
df1 $ unique_val<-paste0(df1 $ numb,df1 $ rand)
>df1麻木兰特unique_val1 1烤面包机12 2烤面包机23 3烤面包机3烤面包机4 4烤面包机4烤面包机5 5烤面包机5烤面包机6 6烤面包机6烤面包机
df2 = data.frame(numb = c(5:7),rand = c(rep("Toaster",2),c(rep("Radio",1))))
df2 $ unique_val<-paste0(df2 $ numb,df2 $ rand)
>df2麻木兰特unique_val1 5烤面包机5烤面包机2 6烤面包机6烤面包机3 7收音机7收音机
如您所见,df2中的第3行是唯一的新行(在df1中没有匹配的unique_val的行).我想将此新行添加到df1.注意:df2中的新行并不总是与之相同.
As you can see, row 3 in df2 is the only new row (a row that does not have a matching unique_val in df1). I would like to add this new row to df1. Note: it's not always the same row that is new in df2.
我使用了这篇帖子中的每个联接,合并/合并数据帧,如下所示:
I used each of the joins from this post, merge/join data frames as follows:
merge(df1,df2,by ="unique_val")
merge(df1,df2,by ="unique_val",全部= TRUE)
merge(df1,df2,by ="unique_val",all.x = TRUE)
merge(df1,df2,by ="unique_val",all.y = TRUE)
我还尝试了dplyr的anti_join:
I also tried the anti_join from dplyr:
anti_join(df1,df2,by ="unique_val")
Rbind给我以下内容:
Rbind gives me the following:
rbind(df1,df2)麻木兰特浓1 1烤面包机12 2烤面包机23 3烤面包机3烤面包机4 4烤面包机4烤面包机5 5烤面包机5烤面包机6 6烤面包机6烤面包机7 5烤面包机5烤面包机8 6烤面包机6烤面包机9 7收音机7收音机
没有一个能给我以下的期望输出:
None of which give me the desired output of the following:
<代码>麻木兰特浓1 1烤面包机12 2烤面包机23 3烤面包机3烤面包机4 4烤面包机4烤面包机5 5烤面包机5烤面包机6 6烤面包机6烤面包机7 7收音机7收音机
我正在寻找这些数据框,而不是加入它们.
I'm looking to rbind these data frames, not join them.
推荐答案
我们可以使用 data.table
中的 rbindlist/unique
.我们将数据集放置在 list
中,使用 rbindlist
(来自 data.table
)将数据集保留在 list
到一个单独的 data.table
并从 data.table
获取具有 unique
的 unique
行,该行还具有 by
选项以指定变量.
We can use rbindlist/unique
from data.table
. We place the datasets in a list
, use rbindlist
(from data.table
) to rbind the datasets in the list
to a single data.table
and get the unique
rows with unique
from data.table
which also has the by
option to specify the variable.
library(data.table)
unique(rbindlist(list(df1, df2)), by = "numb")
# numb rand unique_val
#1: 1 Toaster 1Toaster
#2: 2 Toaster 2Toaster
#3: 3 Toaster 3Toaster
#4: 4 Toaster 4Toaster
#5: 5 Toaster 5Toaster
#6: 6 Toaster 6Toaster
#7: 7 Radio 7Radio
这篇关于如何从R中的一个数据帧到现有数据帧重新绑定新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!