有效地下载大文件与R / RCurl [英] Downloading large files with R/RCurl efficiently

查看:178
本文介绍了有效地下载大文件与R / RCurl的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到许多使用RCurl下载二进制文件的示例如下:

I see that many examples for downloading binary files with RCurl are like such:

library("RCurl")
curl = getCurlHandle()
bfile=getBinaryURL (
        "http://www.example.com/bfile.zip",
        curl= curl,
        progressfunction = function(down, up) {print(down)}, noprogress = FALSE
)
writeBin(bfile, "bfile.zip")
rm(curl, bfile)

如果下载量非常大,我认为最好同时写入存储介质,而不是在内存中读取。

If the download is very large, I suppose it would be better writing it concurrently to the storage medium, instead of fetching all in memory.

在RCurl文档中有一些例子可以通过块获取文件,并在下载时操作它们,但是它们似乎都被引用为文本块。

In RCurl documentation there are some examples to get files by chunks and manipulate them as they are downloaded, but they seem all referred to text chunks.

你能给出一个工作实例吗?

Can you give a working example?

建议对二进制文件使用R本地下载文件 mode ='wb'选项。

A user suggests using the R native download file with mode = 'wb' option for binary files.

在许多情况下,本机函数是一个可行的替代方案,但有一些用例,其中这个本机函数不适用(https,Cookie,窗体等)。 )

In many cases the native function is a viable alternative, but there are a number of use-cases where this native function does not fit (https, cookies, forms etc.) and this is the reason why RCurl exists.

推荐答案

这是一个工作示例:

library(RCurl)
#
f = CFILE("bfile.zip", mode="wb")
curlPerform(url = "http://www.example.com/bfile.zip", writedata = f@ref)
close(f)

它会直接下载到文件。返回的值将是(而不是下载的数据)请求的状态(0,如果没有错误发生)。

It will download straight to file. The returned value will be (instead of the downloaded data) the status of the request (0, if no errors occur).

提到 CFILE 是有点简洁的RCurl手册。希望在未来它会包括更多的细节/示例。

Mention to CFILE is a bit terse on RCurl manual. Hopefully in the future it will include more details/examples.

为了方便起见,相同的代码打包为一个函数(和进度条):

For your convenience the same code is packaged as a function (and with a progress bar):

bdown=function(url, file){
    library('RCurl')
    f = CFILE(file, mode="wb")
    a = curlPerform(url = url, writedata = f@ref, noprogress=FALSE)
    close(f)
    return(a)
}

## ...and now just give remote and local paths     
ret = bdown("http://www.example.com/bfile.zip", "path/to/bfile.zip")

这篇关于有效地下载大文件与R / RCurl的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆