循环使用条件的data.table行 [英] Loop over a data.table rows with condition

查看:123
本文介绍了循环使用条件的data.table行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.table包含id和位置。例如,这里是它的一行:
(它有col和行名,不知道是否重要)

  locations< -data.table(c(11,12),c(-159.58,0.2),c(21.901,22.221))
colnames(locations)< -c ,location_lon,location_lat)
rownames(locations)< -c(1,2)

然后,我想迭代这些行,并将它们与另一个点(使用lat,lon)进行比较。
在for循环中它工作:

  for(i in 1:nrow(locations)){
loc< - locations [i,]
dist< - gdist(-159.5801,21.901,loc $ location_lon,loc $ location_lat,units =m)
if(dist <= 50 ){
return(loc)
}
return(NULL)
}

并返回:



1:11 -159.58 21.901


但我想使用apply。
以下代码无法运行:

  dists<  -  apply(locations,1,function(x)if (50-gdist(-159.5801,21.901,x $ location_lon,x $ location_lat,units =m)> = 0)x else NULL)

$操作符对原子向量错误无效。改为以地点为参考( x [2],x [3] )不足以解决这个问题,我得到

  if(radius  -  gdist(lon,lat,x [2],x [3],units =m)> = 0) 
缺少值,其中TRUE / FALSE需要

这是因为data.table被转换为矩阵,坐标被视为文本而不是数字。
有办法克服这个问题吗?解决方案需要有效(我想运行此检查>> 1,000,000不同的坐标)。

解决方案

不需要任何循环,只需使用 data.table 。如果你想看到的是距离所需位置50米以内的行,你所要做的就是

  locations [ ,if(gdist(-159.58,21.901,location_m)<= 50).SD,id] 
## id location_lon location_lat
## 1:11 -159.58 21.901

这里我们用 id 列,并检查每个 id 是否距离 50米范围内的位置 159.58,21.901 。如果是这样,我们调用 .SD ,它基本上是特定 id 的数据集本身






另外, data.table 没有 row.names ,因此不需要指定它们,请参阅这里,例如


I have a data.table that holds ids and locations. for example, here is it with one row in it: (it has col and row names, don't know if it matters)

locations<-data.table(c(11,12),c(-159.58,0.2),c(21.901,22.221))
colnames(locations)<-c("id","location_lon","location_lat")
rownames(locations)<-c("1","2")

I then want to iterate over the rows and compare them to another point (with lat,lon). In a for loop it works:

for (i in 1:nrow(locations)) {
    loc <- locations[i,]
    dist <- gdist(-159.5801, 21.901, loc$location_lon, loc$location_lat, units="m")
    if(dist <= 50) {
        return (loc)
    }
    return (NULL)
}

and returns:

id location_lon location_lat

1: 11 -159.58 21.901

but I want to use apply. The following code fails to run:

dists <- apply(locations,1,function(x) if (50 - gdist(-159.5801, 21.901, x$location_lon, x$location_lat, units="m")>=0) x else NULL)

with $ operator is invalid for atomic vectors error. Changing to reference by location (x[2],x[3]) isn't enough to fix this, I get

Error in if (radius - gdist(lon, lat, x[2], x[3], units = "m") >= 0) x else NULL : 
missing value where TRUE/FALSE needed 

This is because the data.table is converted to matrix, and the coordinates are treated as text instead of numbers. Is there a way to overcome this? The solution needs to be efficient (I want to run this check for >1,000,000 different coordinates). Changing the data structure of the locations table is possible if needed.

解决方案

No loops are required, just use data.table as intended. If all you want to see are the rows that within 50 meters from the desired location, all you have to do is

locations[, if (gdist(-159.58, 21.901, location_lon, location_lat, units="m") <= 50) .SD, id]
##    id location_lon location_lat
## 1: 11      -159.58       21.901

Here we are iterating by the id column within the locations data set itself and checking if each id is within 50 meters from -159.58, 21.901. If so, we are calling .SD which is basically the data set itself for that specific id.


As a side note, data.table doesn't have row.names, so there is no need of specifiying them, see here, for example

这篇关于循环使用条件的data.table行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆