在RCurl中创建C级文件句柄,用于写入下载的文件 [英] Create a C-level file handle in RCurl for writing downloaded files

查看:191
本文介绍了在RCurl中创建C级文件句柄,用于写入下载的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在RCurl中,一个函数和一个类 CFILE 被定义为使用C级文件句柄。从手册:


目的是能够将这些传递给libcurl作为选项,以便它可以读取或写入或文件。我们也可以使用R连接,并指定操作这些连接的回调函数。但是对于大文件,使用C级FILE句柄可能会显着加快。


没有与下载相关的示例,所以我尝试:

 库(RCurl)
u =http://cran.r-project.org/web/ package / RCurl / RCurl.pdf
f = CFILE(RCurl.pdf,mode =wb)
ret = getURL(u,write = getNativeSymbolInfo(R_curl_write_binary_data)$ address,
file = f @ ref)



我也试过替换 选项与 writedata = f @ ref
文件已下载,但已损坏。
参数写入自定义回调仅适用于非二进制数据。



在RCurl中将二进制文件直接下载到磁盘(无需加载到内存中)的想法?

解决方案

我想你想使用 writedata 并记住关闭文件

  library(RCurl)
filename< - tempfile()
f< - CFILE(filename,wb)
url< - http ://cran.fhcrc.org/Rlogo.jpg
curlPerform(url = url,writedata = f @ ref)
close(f)
pre>

对于更精细的写作,我不知道这是否是最好的方法,但Linux告诉我,从

  man curl_easy_setopt 

有curl选项CURL_WRITEFUNCTION这是一个指向原型

的C函数的指针

  size_t函数(void * ptr,size_t size,size_t nmemb,void * stream ); 

并且在?curlPerform结尾的R中有一个调用C函数作为' ' 选项。所以我创建了一个文件curl_writer.c

  #include< stdio.h> 

size_t
writer(void * buffer,size_t size,size_t nmemb,void * stream)
{
fprintf(stderr,< writer> size =% d,nmemb =%d \\\

(int)size,(int)nmemb);
return size * nmemb;
}

编译

  R CMD SHLIB curl_writer.c 

文件curl_writer.so,然后在R

  dyn.load(curl_writer.so)
writer< - getNativeSymbolInfo(writer,PACKAGE =curl_writer)$ address
curlPerform(URL = url,writefunction = writer)

并取得stderr

 < writer> size = 1,nmemb = 2653 
< writer> size = 1,nmemb = 520
OK

这两个想法可以集成,使用任意函数写入任意文件,通过修改C函数以使用我们传入的FILE *作为

 # include< stdio.h> 

size_t
writer(void * buffer,size_t size,size_t nmemb,void * stream)
{
FILE * fout =(FILE *)stream;
fprintf(fout,< writer> size =%d,nmemb =%d\\\

(int)size,
fflush(fout);
return size * nmemb;
}

然后在编译后返回R

  dyn.load(curl_writer.so)
writer< - getNativeSymbolInfo(writer,PACKAGE =curl_writer)$ address
f < - CFILE(filename <-tempfile(),wb)
curlPerform(URL = url,writedata = f @ ref,writefunction = writer)
close(f)



getURL $ c> writedata = f @ ref,write = writer ;我认为最初问题的问题是 R_curl_write_binary_data 真的是一个内部函数,写入由RCurl管理的缓冲区,而不是像<$ c创建的文件句柄$ c> CFILE 。同样,指定 writedata 而不用(从源代码看来,getURL是写函数的别名)指向一个文件的指针指向一个指向别的东西的函数;对于getURL,需要提供writedata和write。


In RCurl a function and a class CFILE is defined to work with C-level file handles. From the manual:

The intent is to be able to pass these to libcurl as options so that it can read or write from or to the file. We can also do this with R connections and specify callback functions that manipulate these connections. But using the C-level FILE handle is likely to be significantly faster for large files.

There are no examples related to downloads so I tried:

library(RCurl)
u = "http://cran.r-project.org/web/packages/RCurl/RCurl.pdf"
f = CFILE("RCurl.pdf", mode="wb")
ret= getURL(u,  write = getNativeSymbolInfo("R_curl_write_binary_data")$address,
                file  = f@ref)

I also tried by replacing the file option with writedata = f@ref. The file is downloaded but it is corrupted. Writing custom callback for the write argument works only for non-binary data.

Any idea to download a binary file straight to disk (without loading it in memory) in RCurl?

解决方案

I think you want to use writedata and remember to close the file

library(RCurl)
filename <- tempfile()
f <- CFILE(filename, "wb")
url <- "http://cran.fhcrc.org/Rlogo.jpg"
curlPerform(url = url, writedata = f@ref)
close(f)

For more elaborate writing, I'm not sure if this is the best way, but Linux tells me, from

man curl_easy_setopt

that there's a curl option CURL_WRITEFUNCTION that is a pointer to a C function with prototype

size_t function(void *ptr, size_t  size, size_t nmemb, void *stream);

and in R at the end of ?curlPerform there's an example of calling a C function as the 'writefunction' option. So I created a file curl_writer.c

#include <stdio.h>

size_t
writer(void *buffer, size_t size, size_t nmemb, void *stream)
{
    fprintf(stderr, "<writer> size = %d, nmemb = %d\n",
            (int) size, (int) nmemb);
    return size * nmemb;
}

Compiled it

R CMD SHLIB curl_writer.c

which on Linux produces a file curl_writer.so, and then in R

dyn.load("curl_writer.so")
writer <- getNativeSymbolInfo("writer", PACKAGE="curl_writer")$address
curlPerform(URL=url, writefunction=writer)

and get on stderr

<writer> size = 1, nmemb = 2653
<writer> size = 1, nmemb = 520
OK 

These two ideas can be integrated, i.e., writing to an arbitrary file using an arbitrary function, by modifying the C function to use the FILE * we pass in, as

#include <stdio.h>

size_t
writer(void *buffer, size_t size, size_t nmemb, void *stream)
{
    FILE *fout = (FILE *) stream;
    fprintf(fout, "<writer> size = %d, nmemb = %d\n",
            (int) size, (int) nmemb);
    fflush(fout);
    return size * nmemb;
}

and then back in R after compiling

dyn.load("curl_writer.so")
writer <- getNativeSymbolInfo("writer", PACKAGE="curl_writer")$address
f <- CFILE(filename <- tempfile(), "wb")
curlPerform(URL=url, writedata=f@ref, writefunction=writer)
close(f)

getURL can be used here, too, provided writedata=f@ref, write=writer; I think the problem in the original question is that R_curl_write_binary_data is really an internal function, writing to a buffer managed by RCurl, rather than a file handle like that created by CFILE. Likewise, specifying writedata without write (which seems from the source code to getURL to be an alias for writefunction) sends a pointer to a file to a function expecting a pointer to something else; for getURL both writedata and write need to be provided.

这篇关于在RCurl中创建C级文件句柄,用于写入下载的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆