从R中的HTTPS连接逐行读取 [英] Line by line reading from HTTPS connection in R
问题描述
使用 open ="r"
创建连接时,它允许逐行读取,这对于批处理大型数据流非常有用.例如,此脚本通过一次读取100行来解析大量gzip压缩的JSON HTTP流.但是很遗憾,R不支持SSL:
When a connection is created with open="r"
it allows for line-by-line reading, which is useful for batch processing large data streams. For example this script parses a sizable gzipped JSON HTTP stream by reading 100 lines at a time. However unfortunately R does not support SSL:
> readLines(url("https://api.github.com/repos/jeroenooms/opencpu"))
Error in readLines(url("https://api.github.com/repos/jeroenooms/opencpu")) :
cannot open the connection: unsupported URL scheme
RCurl
和 httr
软件包确实支持HTTPS,但我认为它们无法创建类似于 url()的连接对象.代码>.类似于上面脚本中的示例,还有其他方法可以逐行读取HTTPS连接吗?
The RCurl
and httr
packages do support HTTPS, but I don't think they are capable of creating a connection object similar to url()
. Is there some other way to do line-by-line reading of an HTTPS connection similar to the example in the script above?
推荐答案
一种解决方案是通过 pipe
手动调用 curl
可执行文件.以下似乎有效.
One solution is to manually call the curl
executable via pipe
. The following seems to work.
library(jsonlite)
stream_https <- gzcon(pipe("curl https://jeroenooms.github.io/files/hourly_14.json.gz", open="r"))
batches <- list(); i <- 1
while(length(records <- readLines(gzstream, n = 100))){
message("Batch ", i, ": found ", length(records), " lines of json...")
json <- paste0("[", paste0(records, collapse=","), "]")
batches[[i]] <- fromJSON(json, validate=TRUE)
i <- i+1
}
weather <- rbind.pages(batches)
rm(batches); close(gzstream)
但是,这不是最理想的,因为出于各种原因, curl
可执行文件可能不可用.直接通过RCurl/libcurl调用此管道会更好.
However this is suboptimal because the curl
executable might not be available for various reasons. Would be much nicer to invoke this pipe directly via RCurl/libcurl.
这篇关于从R中的HTTPS连接逐行读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!