R - 根据第二个数据帧中的最匹配分配列值 [英] R - Assign column value based on closest match in second data frame
问题描述
logger< - data.frame
time = c(1280248354:1280248413),
temp = runif(60,min = 18,max = 24.5)
)
df< - data。框架(
obs = c(1:10),
time = runif(10,min = 1280248354,max = 1280248413),
temp = NA
)
我想搜索logger $ time,以便在df $ time中与每行最匹配,并分配关联的记录器$ temp to df $ temp。到目前为止,我已经成功地使用以下循环:
for(i in 1:length(df $ time)){
nearestto< -which.min(abs((logger $ time) - (df $ time [i])))
df $ temp [i]< -logger $ temp [最近]
但是,我现在有大数据帧(记录器有13,620行,df有266138)处理时间长。我已经看到,循环不是最有效的方法,但我不熟悉替代方案。有更快的方法吗?
我将使用 data.table
它使它超级简单,超快速加入键
。甚至还有一个真正有用的 roll =nearest
参数,正是您正在寻找的行为(除了您的示例数据,这不是必需的,因为所有 df
出现在 logger
中的时间。在以下示例中,我将 df $ time
重命名为 df $ time1
,以清除哪个列属于哪个表。
$ .frames into data.tables with a key column
ldt< - data.table(logger,key =time)
dt< - data.table(df,key =time1)
#根据两个表的key列(time& time1)
#roll =nearest给出所需的行为
#list(obs,time1, temp)给出要从dt返回的列
ldt [dt,list(obs,time1,temp),roll =nearest]
#time obs time1 temp
#1: 1280248361 8 1280248361 18.07644
#2:1280248366 4 1280248366 21.88957
#3:1280248370 3 1280248370 19.09015
#4:1280248376 5 1280248376 22.39770
#5:1280248381 6 1280248381 24.12758
#6:1280248383 10 1280248383 22.70919
#7:1280248385 1 1280248385 18 .78183
#8:1280248389 2 1280248389 18.17874
#9:1280248393 9 1280248393 18.03098
#10:1280248403 7 1280248403 22.74372
I have two data frames, logger and df (times are numeric):
logger <- data.frame(
time = c(1280248354:1280248413),
temp = runif(60,min=18,max=24.5)
)
df <- data.frame(
obs = c(1:10),
time = runif(10,min=1280248354,max=1280248413),
temp = NA
)
I would like to search logger$time for the closest match to each row in df$time, and assign the associated logger$temp to df$temp. So far, I have been successful using the following loop:
for (i in 1:length(df$time)){
closestto<-which.min(abs((logger$time) - (df$time[i])))
df$temp[i]<-logger$temp[closestto]
}
However, I now have large data frames (logger has 13,620 rows and df has 266138) and processing times are long. I've read that loops are not the most efficient way to do things, but I am unfamiliar with alternatives. Is there a faster way to do this?
I'd use data.table
for this. It makes it super easy and super fast joining on keys
. There is even a really helpful roll = "nearest"
argument for exactly the behaviour you are looking for (except in your example data it is not necessary because all times
from df
appear in logger
). In the following example I renamed df$time
to df$time1
to make it clear which column belongs to which table...
# Load package
require( data.table )
# Make data.frames into data.tables with a key column
ldt <- data.table( logger , key = "time" )
dt <- data.table( df , key = "time1" )
# Join based on the key column of the two tables (time & time1)
# roll = "nearest" gives the desired behaviour
# list( obs , time1 , temp ) gives the columns you want to return from dt
ldt[ dt , list( obs , time1 , temp ) , roll = "nearest" ]
# time obs time1 temp
# 1: 1280248361 8 1280248361 18.07644
# 2: 1280248366 4 1280248366 21.88957
# 3: 1280248370 3 1280248370 19.09015
# 4: 1280248376 5 1280248376 22.39770
# 5: 1280248381 6 1280248381 24.12758
# 6: 1280248383 10 1280248383 22.70919
# 7: 1280248385 1 1280248385 18.78183
# 8: 1280248389 2 1280248389 18.17874
# 9: 1280248393 9 1280248393 18.03098
#10: 1280248403 7 1280248403 22.74372
这篇关于R - 根据第二个数据帧中的最匹配分配列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!