在R中有效地从网格内插值 [英] Interpolate values from a grid efficiently in R

查看:181
本文介绍了在R中有效地从网格内插值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个按位置划分的海洋深度数据网格,并且正在尝试插值深度值以选择GPS点.

I have a grid of ocean depth data by location, and am trying to interpolate depth values for a selection of GPS points.

我们一直在使用RSAGA :: pick.from.points,它适用于小型数据集.

We've been using RSAGA::pick.from.points, which works fine for small data sets.

require(RSAGA)

depthdata <- cbind.data.frame(x=c(74.136, 74.135, 74.134, 74.133, 74.132, 74.131, 74.130, 74.129, 74.128, 74.127), 
y=rep(40, times=10), 
depth=c(-0.6, -0.6, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.6, -0.6))

mylocs <- rbind(c(-74.1325, 40), c(-74.1305, 40))
colnames(mylocs) <- c("x", "y")

results <- pick.from.points(data=mylocs, src=depthdata, pick=c("depth"), method="nearest.neighbour")
mydepths <- results$depth

但是我们的深度数据集包含6900万个数据点,我们有500万个GPS点需要进行深度估计,而pick.from.points对于该数据集来说花费的时间太长(> 2周) .我们认为我们可以在MATLAB或ArcMap中更快地完成此任务,但我们正在尝试将此任务合并到R中更长的工作流中,我们正在编写该代码供其他人重复运行,因此切换到专有软件作为部分内容该工作流程不尽人意.

But our depth data set contains 69 million data points, and we have 5 million GPS points that we'd like depth estimates for, and pick.from.points is just taking too long (> 2 weeks) for this data set. We think that we could accomplish this task more quickly in MATLAB or ArcMap, but we're trying to incorporate this task into a longer workflow in R that we're writing for other people to run repeatedly, so switching to proprietary software for part of that workflow is less than desirable.

我们愿意为速度牺牲某种程度的准确性.

We'd be willing to sacrifice some degree of accuracy for speed.

我一直在寻找最佳解决方案,但是对于网格数据和插值我还很陌生,因此可能使用了不合适的语言,因此缺少一种简单的解决方案.

I've looked for a solution as best as I can, but I'm fairly new to grid data and interpolation, so might be using inappropriate language and therefore missing a simple solution.

推荐答案

如果您愿意通过查找最近的邻居并使用其值来进行估算,那么我认为,诀窍是使用一种有效的最近邻居实现,该实现使您能够在O(log(n))时间中找到n个替代方案中最接近的邻居. kd树提供了这种性能,并且可以通过R中的FNN包获得.尽管计算(对随机生成的数据(具有6,900万个数据点作为参考,而要归入500万个数据点))不是瞬时的(大约需要3分钟),比2周要快得多!

If you were willing to impute by finding the nearest neighbor and using its value, I think the trick would be to use an efficient nearest neighbors implementation that allows you to find the nearest neighbor among n alternatives in O(log(n)) time. The k-d tree provides this sort of performance, and is available through the FNN package in R. While the computation (on randomly generated data with 69 million data points for reference and 5 million data points to impute) isn't instantaneous (it takes about 3 minutes), it's much quicker than 2 weeks!

data <- cbind(x=rnorm(6.9e7), y=rnorm(6.9e7))
labels <- rnorm(6.9e7)
query <- cbind(x=rnorm(5e6), y=rnorm(5e6))

library(FNN)
get.nn <- function(data, labels, query) {
  nns <- get.knnx(data, query, k=1)
  labels[nns$nn.index]
}
system.time(get.nn(data, labels, query))
#    user  system elapsed
# 174.975   2.236 177.617

作为警告,该过程的峰值内存约为10GB,因此您将需要大量的内存资源才能在您的大小的数据集上运行.

As a warning, the process peaked around 10GB of RAM, so you will need significant memory resources to run on a dataset of your size.

这篇关于在R中有效地从网格内插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆