从R中的HTTPS连接逐行读取 [英] Line by line reading from HTTPS connection in R

查看:60
本文介绍了从R中的HTTPS连接逐行读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 open ="r" 创建连接时,它允许逐行读取,这对于批处理大型数据流非常有用.例如,此脚本通过一次读取100行来解析大量gzip压缩的JSON HTTP流.但是很遗憾,R不支持SSL:

When a connection is created with open="r" it allows for line-by-line reading, which is useful for batch processing large data streams. For example this script parses a sizable gzipped JSON HTTP stream by reading 100 lines at a time. However unfortunately R does not support SSL:

> readLines(url("https://api.github.com/repos/jeroenooms/opencpu"))
Error in readLines(url("https://api.github.com/repos/jeroenooms/opencpu")) : 
  cannot open the connection: unsupported URL scheme

RCurl httr 软件包确实支持HTTPS,但我认为它们无法创建类似于 url()的连接对象.代码>.类似于上面脚本中的示例,还有其他方法可以逐行读取HTTPS连接吗?

The RCurl and httr packages do support HTTPS, but I don't think they are capable of creating a connection object similar to url(). Is there some other way to do line-by-line reading of an HTTPS connection similar to the example in the script above?

推荐答案

一种解决方案是通过 pipe 手动调用 curl 可执行文件.以下似乎有效.

One solution is to manually call the curl executable via pipe. The following seems to work.

library(jsonlite)
stream_https <- gzcon(pipe("curl https://jeroenooms.github.io/files/hourly_14.json.gz", open="r"))
batches <- list(); i <- 1
while(length(records <- readLines(gzstream, n = 100))){
  message("Batch ", i, ": found ", length(records), " lines of json...")
  json <- paste0("[", paste0(records, collapse=","), "]")
  batches[[i]] <- fromJSON(json, validate=TRUE)
  i <- i+1
}
weather <- rbind.pages(batches)
rm(batches); close(gzstream)

但是,这不是最理想的,因为出于各种原因, curl 可执行文件可能不可用.直接通过RCurl/libcurl调用此管道会更好.

However this is suboptimal because the curl executable might not be available for various reasons. Would be much nicer to invoke this pipe directly via RCurl/libcurl.

这篇关于从R中的HTTPS连接逐行读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆