考虑键列之间的范围匹配,合并两个数据框 [英] Merge two data frames considering a range match between key columns

查看:46
本文介绍了考虑键列之间的范围匹配,合并两个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 R 编程的初学者.我目前正在尝试从包含 X 和 Y 坐标以及站点名称的数据框中检索一些站点名称,并将它们复制到具有特定点的不同数据框中.

I am a beginner in programming in R. I am at the moment trying to retrieve some site names from a dataframe containing the X and Y coordinates and site names and copy them into a different dataframe with specific points.

    FD <- matrix(data =c(rep(1, 500), rep(0, 500),
                     rnorm(1000, mean = 550000, sd=4000),
                     rnorm(1000, mean = 6350000, sd=20000), rep(NA, 1000)),
             ncol = 4, nrow = 1000, byrow = FALSE)
colnames(FD) <- c('Survival', 'X', 'Y', 'Site') 
FD <- as.data.frame(FD)

shpxt <- matrix(c(526654.7,526810.5 ,6309098,6309187,530405.4,530692,
                  6337699, 6338056,580432.7, 580541.9, 6380246,6380391,
                  585761.3, 585847.6, 6379665, 6379759, 584192.1, 584279.4,
                  6382358, 6382710, 583421.2, 583492.4, 6379356, 6379425,
                  532395.5, 532515.3 , 6336421, 6336587, 534694.6, 534791.2,
                  6335620, 6335740, 536749.8, 536957.5, 6337584, 6338130, 590049.6,
                  590419.4, 6372232, 6372432, 580443, 580756.5, 6386342, 6386473,
                  575263.9, 575413.7, 6380416, 6380530, 584625.1, 584753.9, 6381009,
                  6381335), ncol = 4, nrow = 13, byrow = TRUE)
sites <- c("Brandbaeltet", "Brusaa", "Granly", "Jerup Strand", "Knasborgvej",
           "Milrimvej", "Overklitten", "Oversigtsareal", "Sandmosen",
           "Strandby", "Troldkaer", "Vaagholt", "Videsletengen")
colnames(shpxt) <- c("Xmin", "Xmax", "Ymin", "Ymax")
shpxt <- as.data.frame(shpxt)
shpxt["Sites"] <- sites

我的方法是使用这样的嵌套 for 循环:

My approach is using a nested for loop like this:

    tester <- function(FD, shpxt)
{ for (i in 1:nrow(FD)) for (j in 1:nrow(shpxt))         # Open Function
{ if (FD[i,2] >= shpxt[j,1] | FD[i,2] <= shpxt[j,2] &    # Open Loop
      FD[i,3] >= shpxt[j,3] | FD[i,3] <= shpxt[j,4])
{                                                        # Open Consequent
  FD[i,4]=shpxt[j,5]
  {break}
} else                                                  # Close Consequent
{FD[i,4] <- NA                                          # Open alternative
}                                                      # Close alternative
}                                                      # Close loop
}                                                      # Close function

tester(FD, shpxt)

本质上,我想搜索 FD 中的 X 和 Y 坐标落入范围内的站点,并将站点名称复制到第 i 行的 FD$Site 中.当我在真实数据上运行循环时,我收到以下错误消息:

In essence I want to search for which site the X and Y coordinates in FD fall into range and copy the sitename into FD$Site in row i. When I run the loop on my real data I get the following error message:

test(FD, shpxt)
Error in if (FD[i, 2] >= shpxt[j, 1] | FD[i, 2] <= shpxt[j, 2] & FD[i,  : 
  missing value where TRUE/FALSE needed

如何让循环从这里转到循环将所需站点名称复制到我的 FD 中的位置?

How do I get the loop to go from here to where the loop will be copying the desired sitename into my FD?

亲切的问候Thøger

Kind Regards Thøger

推荐答案

考虑键列之间的范围匹配,您想要合并两个数据框.这里有两个解决方案.

You want to merge two data frames considering a range match between key columns. Here are two solutions.

library(sqldf)

output <- sqldf("select * from FD left join shpxt 
                on (FD.X >= shpxt.Xmin and FD.X <= shpxt.Xmax and
                    FD.Y >= shpxt.Ymin and FD.Y <= shpxt.Ymax ) ")

使用data.table

library(data.table)

# convert your datasets in data.table
  setDT(FD) 
  setDT(shpxt)

output <- FD[shpxt, on = .(X >= Xmin , X <= Xmax,                # indicate x range
                           Y >= Ymin , Y <= Ymax), nomatch = NA, # indicate y range
             .(Survival, X, Y, Xmin, Xmax, Ymin, Ymax, Sites )]  # indicate columns in the output

解决这个问题有不同的选择,你会在其他 SO 问题中找到它 此处此处.

There are different alternatives to solve this problem, as you will find it in other SO questions here and here.

ps.请记住,for 循环 不一定是最佳解决方案.

ps. Keep in mind that for loop is not necessarily the best solution.

这篇关于考虑键列之间的范围匹配,合并两个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆