带有“wget"方法并指定额外 wget 选项的 R download.file [英] R download.file with "wget"-method and specifying extra wget options

查看:37
本文介绍了带有“wget"方法并指定额外 wget 选项的 R download.file的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于在 R 中使用 wget 选项和使用一些 wget 额外选项在 R 中使用 download.file 函数的可能相当基本的问题,但我不能让它工作.

I have a probably rather basic question to using the download.file function in R using the wget option and employing some of the wget extra options, but I just cannot get it to work.

我想要做的是:下载一个网页的本地副本(实际上是几个网页,但目前的挑战是即使只有 1 个网页也能正常工作).

What I want to do: download a local copy of a webpage (actually several webpages, but for now the challenge is to get it to work even with 1).

挑战:我需要本地副本看起来与在线版本完全一样,这也意味着包含链接/图标等.我发现 wget 是一个很好的工具,我想指定一些额外的选项,如--随机等待-p-r.我在这方面找到了一些非常有用的教程,但是他们都没有在 R 中使用额外的选项,而是直接在 wget 中使用.

Challenge: I need the local copy to look exactly like the online version, which also means to include links/ icons, etc.. I found wget to be a good tool for this and I would like to specify some of the extra options, such as --random wait, -p, -r. I found some very helpful tutorials on this, however none of them employed the extra options in R, but rather in wget directly.

这是我为此编写的代码:

So here is the code I have put together for this:

download.file('https://www.wikipedia.org/', destfile = "wikipage", method = "wget", extra = getOption("--random wait", "-r", "-p"))

这不起作用.我怀疑wget"方法和附加功能的规范都存在问题.

which does not work. I suspect there are problems with both, the "wget" method and the specification of the extras.

任何人都可以提供帮助,将不胜感激?

Can anyone help, it would be much appreciated?

一个额外的问题:我知道 destfile 应该为下载的文档指定一个文件名,但是有什么方法可以通过所有下载的文件应该指向的路径指定一个文件夹被拯救?

A bonus question: I know that the destfile is supposed to specify a file name for the downloaded document, but is there any way I could specify a folder through a path to which all downloaded files should be saved?

先谢谢你!

最好的卡罗琳

推荐答案

可以直接在 extra 参数中指定多个选项,无需 getOption().

You can specify multiple options directly in the extra argument, without getOption().

此外,您可以简单地在 destfile 中包含要保存下载文件的文件的路径.

Further, you can simply include the path to the file where you want to save your downloaded file in the destfile.

download.file('https://www.wikipedia.org/', destfile = "mydirectory/wikipage.html", method = "wget", extra = "-r -p --random-wait")

但是,您会遇到问题,它会尝试将所有下载的项目保存到同一个 destfile 中.

You will, however, have the problem that it will attempt to save all downloaded items into the same destfile.

请注意,不久前有一个类似问题(我现在才看到).建议的解决方案是使用 system() 而不是 download.file 来运行 wget 命令.适应您的问题:

Note that there was a similar question a while ago (I saw that only now). The suggested solution was to use system() instead of download.file to run the wget command. Adapted to your problem:

setwd("./mydirectory")
system("wget http://www.wikipedia.org -p -k --random-wait")

另请注意,这两个命令仅适用于安装了 wget 的系统.在 Linux/BSD/Mac 上,要安装的软件包通常应称为 wget.在 Windows 上,wget 是(根据 download.file() help) 可从 gnuwin32 和 Cygwin 等软件包获得.在这种情况下,如果系统不知道 wget 可执行文件在哪里,system() 命令可能仍然不起作用.在这种情况下,您可能需要指定 wget 可执行文件的绝对路径.

Please also note that both commands will only work on systems with wget installed. On Linux/BSD/Mac, the package to install should usually be called wget. On Windows, wget is (according to the download.file() help) available from packages like gnuwin32 and Cygwin. In this case, the system() command may still not work if the system does not know where the wget executable is. You may, in this case, need to specify the absolute path to the wget executable.

这篇关于带有“wget"方法并指定额外 wget 选项的 R download.file的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆