用R快速查询网址 [英] fast url query with R

查看:74
本文介绍了用R快速查询网址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我要查询一个网站10,000次,我正在寻找一种真正的快速方法来使用R

Hi have to query a website 10000 times I am looking for a real fast way to do it with R

作为模板网址:

url <- "http://mutationassessor.org/?cm=var&var=7,55178574,G,A"

我的代码是:

url  <- mydata$mutationassessorurl[1]
rawurl  <- readHTMLTable(url)
Mutator  <- data.frame(rawurl[[10]])

for(i in 2:27566) {
  url  <- mydata$mutationassessorurl[i]
  rawurl  <- readHTMLTable(url)
  Mutator  <- smartbind(Mutator, data.frame(rawurl[[10]]))  
  print(i)
}

使用microbenchmark我有680毫秒的查询时间.我想知道是否有更快的方法!

using microbenchmark I have 680 milliseconds for query. I was wondering if there is a faster way to do it!

谢谢

推荐答案

加快HTTP连接速度的一种方法是使连接保持打开状态 在请求之间.以下示例显示了它的不同之处 对于httr.第一个选项与的默认行为最为相似 RCurl.

One way to speed up http connections is to leave the connection open between requests. The following example shows the difference it makes for httr. The first option is most similar to the default behaviour in RCurl.

library(httr)
test_server <- "http://had.co.nz"

# Return times in ms for easier comparison
timed_GET <- function(...) {
  req <- GET(...)
  round(req$times * 1000)
}

# Create a new handle for every request - no connection sharing
rowMeans(replicate(20, 
  timed_GET(handle = handle(test_server), path = "index.html")
))

##      redirect    namelookup       connect   pretransfer starttransfer 
##          0.00         20.65         75.30         75.40        133.20 
##         total 
##        135.05

test_handle <- handle(test_server)
# Re use the same handle for multiple requests
rowMeans(replicate(20, 
  timed_GET(handle = test_handle, path = "index.html")
))

##      redirect    namelookup       connect   pretransfer starttransfer 
##          0.00          0.00          2.55          2.55         59.35 
##         total 
##         60.80

# With httr, handles are automatically pooled
rowMeans(replicate(20,
  timed_GET(test_server, path = "index.html")
))

##      redirect    namelookup       connect   pretransfer starttransfer 
##          0.00          0.00          2.55          2.55         57.75 
##         total 
##         59.40

请注意namelookup和connect的区别-如果您共享一个 您只需要执行一次这些操作即可,这样可以节省 相当多的时间.

Note the difference in the namelookup and connect - if you're sharing a handle you need to do each of these operations only once, which saves quite a bit of time.

请求内差异很大-平均而言,后两个 方法应该非常相似.

There's quite a lot of intra-request variation - on average the last two methods should be very similar.

这篇关于用R快速查询网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆