如何配置将来下载更多文件? [英] How can I configure future to download more files?

查看：92 发布时间：2021/5/2 20:34:37 r parallel-processing download furrr

本文介绍了如何配置将来下载更多文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有很多文件需要下载.

I have a lot of files I need to download.

我正在使用 download.file()函数和 furrr :: map 通过 plan(strategy ="multicore")并行下载


I am using download.file() function and furrr::map to download in parallel, with plan(strategy = "multicore").
请告知我如何为每个将来加载更多工作?
Please advise how can I load more jobs for each future?
在具有8个内核的Ubuntu 18.04上运行.R版本3.5.3.
Running on Ubuntu 18.04 with 8 cores.
R version 3.5.3.
文件可以是txt，zip或任何其他格式.大小范围为5MB-40MB.
The files can be txt, zip or any other format.
Size varies in range of 5MB - 40MB each.
推荐答案
使用furrr可以正常工作.我认为您的意思是 furrr :: future_map .使用 multicore 可以大大提高下载速度(注意:在Windows上， multicore 不可用，只有 multisession . multiprocess (如果不确定您的代码将在哪个平台上运行).
Using furrr works just fine. I think what you mean is furrr::future_map. Using multicore substantially increases the downloading speed (Note: on Windows, multicore is not available, only multisession. Use multiprocess if you are unsure what platform your code will be run on).
library(furrr)
#> Loading required package: future

csv_file <- "https://raw.githubusercontent.com/UofTCoders/rcourse/master/data/iris.csv"

download_template <- function(.x) {
    temp_file <- tempfile(pattern = paste0("dl-", .x, "-"), fileext = ".csv")
    download.file(url = csv_file, destfile = temp_file)
}

download_normal <- function() {
    for (i in 1:5) {
        download_template(i)
    }
}

download_future_core <- function() {
    plan(multicore)
    future_map(1:5, download_template)
}

download_future_session <- function() {
    plan(multisession)
    future_map(1:5, download_template)
}

library(microbenchmark)

microbenchmark(
    download_normal(),
    download_future_core(),
    download_future_session(),
    times = 3
)
#> Unit: milliseconds
#>                       expr       min        lq      mean    median
#>          download_normal()  931.2587  935.0187  937.2114  938.7787
#>     download_future_core()  433.0860  435.1674  488.5806  437.2489
#>  download_future_session() 1894.1569 1903.4256 1919.1105 1912.6942
#>         uq       max neval
#>   940.1877  941.5968     3
#>   516.3279  595.4069     3
#>  1931.5873 1950.4803     3

由 reprex软件包(v0.2.1)
请记住，我正在使用Ubuntu，所以使用Windows可能会改变事情，因为据我所知，未来不允许Windows上使用多核.
Keep in mind, I am using Ubuntu, so using Windows will likely change things, since as far as I understand future doesn't allow multicore on Windows.
我只是在猜测，但是 multisession 较慢的原因可能是因为它必须在运行 download.file 函数之前打开多个R会话.我只是下载了一个很小的数据集( iris.csv )，所以也许在花费更多时间的较大数据集上，打开R会话所花费的时间将被下载较大数据集所花费的时间所抵消.文件.
I am just guessing here, but the reason that multisession is slower could be because it has to open up several R sessions before running the download.file function. I was just downloading a very small dataset (iris.csv), so maybe on larger datasets that take more time, the time taken to open an R session would be offset by the time it takes to download larger files.
 次要更新:
您可以将URL的向量传递到数据集到 future_map 中，以便它根据将来的包处理确定下载每个文件:
You can pass a vector of URLs to the datasets into future_map so it downloads each file as determined by the future package processing:
data_urls <- c("https:.../data.csv", "https:.../data2.csv")
library(furrr)
plan(multiprocess)
future_map(data_urls, download.file)
# Or use walk 
# future_walk(data_urls, download.file)


                        这篇关于如何配置将来下载更多文件?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何配置将来下载更多文件? [英] How can I configure future to download more files?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何配置将来下载更多文件? [英] How can I configure future to download more files?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭