在data.table中应用按行返回列表/矩阵的函数 [英] Applying a function returning list/matrix row-wise in data.table

查看:41
本文介绍了在data.table中应用按行返回列表/矩阵的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试执行

I am trying to do the steps mentioned in http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/ but using data.table. Especially step 8 listed there. Attached are my steps and the problem I'm running into:

library(data.table)
library(maps)
library(geosphere)
airports <- as.data.table(read.csv("http://datasets.flowingdata.com/tuts/maparcs/airports.csv", header=TRUE))
flights <- as.data.table(read.csv("http://datasets.flowingdata.com/tuts/maparcs/flights.csv", header=TRUE, as.is=TRUE))

setnames(airports,c("airport1",names(airports)[2:7]))
setkey(flights,airport1)
setkey(airports,airport1)
ap <- merge(flights,airports)
setkey(ap,airport2)
setnames(airports,c("airport2",names(airports)[2:7]))
setkey(airports,airport2)
setkey(ap,airport2)
ap2 <- merge(ap,airports)
ap3 <- ap2[,.(airport1,airport2,airline,cnt,lat.x,long.x,lat.y,long.y)]
## ap3[,inter:=gcIntermediate(c(long.x,lat.x),c(long.y,lat.y),n=100,addStartEnd=TRUE),]  ## Error in .pointsToMatrix(p1) : Wrong length for a vector, should be 2
## ap3[,inter:=gcIntermediate(c(long.x,lat.x),c(long.y,lat.y),n=100,addStartEnd=TRUE),]  ## Error in .pointsToMatrix(p1) : Wrong length for a vector, should be 2
## 
## Tried some more stuff but no luck!
## fn <- function(lonx,latx,lony,laty) gcIntermediate(c(lonx,latx),c(lony,laty),n=100,addStartEnd=TRUE)
## ap3[,do.call(fn,.SD),.SDcols=5:8] ## Error in (function (lonx, latx, lony, laty)  : unused arguments (lat.x = c(35.21401111, 35.2140 ... snip ...

因此,我搜索了stackoverflow,并尝试了[1]和[2]中列出的步骤,但是无法正常工作.我记得读过某个地方(虽然现在找不到),但data.table可以存储列表,但我不知道该怎么做.另外,除了FAQ 2.9节中列出的内容之外,还有什么方法可以调试 j 中的函数?

So I searched stackoverflow and tried steps listed in [1] and [2] but couldn't get it to work. I remember reading somewhere (cannot find it now though) that data.table can store lists but I cannot figure out how. Also, is there some way to debug functions in the j apart from what's listed in the Section 2.9 of the FAQ?

[1] 对data.table进行高效的逐行操作

[2] 将函数应用于每行data.table

推荐答案

这实际上应该是注释,但不适合该注释:对于分别由c(long.x,lat.x)和c(long.y,lat.y)定义的每个p1和p2,您都有一个矩阵(或列表)(此后,我仅关注矩阵)和该矩阵的维数取决于n和addStartEnd的值.例如,如果设置n = 1且addStartEnd = FALSE,则将返回尺寸为1乘2的矩阵;如果设置n = 1且addStartEnd = TRUE,则将返回尺寸为3乘2的矩阵.现在,使用像您一样的data.table操作,您不能简单地附加值.我不是data.table专家,但是我认为正确的方法是,您必须先进行行操作,然后使用 rbindlist .例如,

This should be really a comment, but it doesn't fit there: For each p1 and p2 as defined by c(long.x,lat.x) and c(long.y,lat.y), respectively, you have a matrix (or a list) (hereafter, I focus on the matrix only) and dimension of that matrix depends on values of n and addStartEnd. For example, if you set n=1 and addStartEnd=FALSE, it will return a matrix of dimension of 1 by 2, and if you set n=1 and addStartEnd=TRUE, it will return a matrix of dimension of 3 by 2. Now, with data.table operation like yours, you can't simply append the values. I am not a data.table expert, but what I think a right way, is that you have to do rowwise operation and then use rbindlist.,e.g.,

apt<-setDT(ap3)

tt<-rbindlist(lapply(1:nrow(apt),function(i)cbind(apt[i,],gcIntermediate(apt[i,c("long.x","lat.x")],apt[i,c("long.y","lat.y")],n=100,addStartEnd=TRUE))))

> tt
        airport1 airport2 airline cnt    lat.x     long.x    lat.y    long.y        lon      lat
     1:      CLT      ABE     all  56 35.21401  -80.94313 40.65236  -75.4404  -80.94313 35.21401
     2:      CLT      ABE     all  56 35.21401  -80.94313 40.65236  -75.4404  -80.89245 35.26904
     3:      CLT      ABE     all  56 35.21401  -80.94313 40.65236  -75.4404  -80.84171 35.32405
     4:      CLT      ABE     all  56 35.21401  -80.94313 40.65236  -75.4404  -80.79090 35.37904
     5:      CLT      ABE     all  56 35.21401  -80.94313 40.65236  -75.4404  -80.74002 35.43401
    ---                                                                                         
510710:      PHX      YUM      YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.50396 32.68840
510711:      PHX      YUM      YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.52947 32.68045
510712:      PHX      YUM      YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.55498 32.67250
510713:      PHX      YUM      YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.58048 32.66454
510714:      PHX      YUM      YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.60597 32.65658

按照@Frank的建议:您可以仅使用data.table操作(其中102 = 100(n)+ 2(addStartEnd = TRUE))进行以下操作

As per the suggestion of @Frank: you can proceed as follows using only data.table operation (where 102 =100 (n)+ 2 (addStartEnd=TRUE))

ap3[,gcIntermediate(c(long.x,lat.x),c(long.y,lat.y),n=100,addStartEnd=TRUE),by=1:nrow(ap3)][,list(lon=head(V1,102),lat=tail(V1,102)),by=nrow]
        nrow        lon      lat
     1:    1  -80.94313 35.21401
     2:    1  -80.89245 35.26904
     3:    1  -80.84171 35.32405
     4:    1  -80.79090 35.37904
     5:    1  -80.74002 35.43401
    ---                         
510710: 5007 -114.50396 32.68840
510711: 5007 -114.52947 32.68045
510712: 5007 -114.55498 32.67250
510713: 5007 -114.58048 32.66454
510714: 5007 -114.60597 32.65658

这篇关于在data.table中应用按行返回列表/矩阵的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆