当download.file花费的时间太长时,跳至下一个迭代 [英] Skip to the next iteration when download.file takes too long
问题描述
我一直在尝试跳过耗时太长的download.file
迭代,尽管我尝试了一些类似的
I have been trying to skip a iteration of download.file
that takes too long and it is not working accordingly, though I've tried some similar answers to my problem. I've set an example below with the code that I have been using. My main problem is that some of the IDs (from the vec
object below) that I am using to extract the .csv files do not possess a related .csv file and the URL does not work adequately -- I believe it keeps trying the URL until it gets a response, which it does not, and the loop starts taking too long. How can I skip an ID if download.file
starts taking too long?
library(stringr)
library(R.utils)
vec=c("05231992000181","00628708000191","05816554000185", "01309949000130","07098414000144", "07299568000102", "12665438000178", "63599658000181", "12755123000111", "12376766000154",
"11890564000163", "04401095000106", "11543768000128", "10695634000160", "34931022000197", "10422225000190",
"09478854000152", "12682106000100", "11581441000140", "10545688000149", "10875891000183", "13095498000165",
"10809607000170", "07976466000176", "11422211000139", "41205907000174", "08326720000153", "06910908000119",
"04196935000227", "02323120000155", "96560701000154")
for (i in seq_along(vec)) {
url = paste0("http://compras.dados.gov.br/licitacoes/v1/licitacoes.csv?cnpj_vencedor=", vec[i])
tryCatch(expr = {evalWithTimeout(download.file(url,
destfile = paste0("C:/Users/Username/Desktop/example_file/",vec[i],".csv"),
mode="wb"), timeout=3)},
error=function(ex) cat("Timeout. Skipping.\n"))
print(i)
}
推荐答案
在可能的情况下,HTTP状态是一种处理这种情况的有效方法,但是如果服务器没有响应,则可以使用httr::timeout
设置超时,传递给httr::GET
.通过tidyverse将所有内容保存在整洁的数据框列表列中,
When possible, HTTP status is an efficient way to deal with this situation, but if the server is not responding, you can set a timeout with httr::timeout
, passed to httr::GET
. Keeping everything in neat data frame list columns via the tidyverse,
library(dplyr)
library(purrr)
base_url <- "http://compras.dados.gov.br/licitacoes/v1/licitacoes.csv?cnpj_vencedor="
df <- data_frame(cnpj_vencedor = c("05231992000181", "00628708000191", "05816554000185", "01309949000130","07098414000144", "07299568000102", "12665438000178", "63599658000181", "12755123000111", "12376766000154", "11890564000163", "04401095000106", "11543768000128", "10695634000160", "34931022000197", "10422225000190", "09478854000152", "12682106000100", "11581441000140", "10545688000149", "10875891000183", "13095498000165","10809607000170", "07976466000176", "11422211000139", "41205907000174", "08326720000153", "06910908000119", "04196935000227", "02323120000155", "96560701000154"))
df <- df %>%
# iterate GET over URLs, modified by `purrr::safely` to return a list of
# the result and the error (NULL where appropriate), with timeout set
mutate(response = map(paste0(base_url, cnpj_vencedor),
safely(httr::GET), httr::timeout(3)))
df <- df %>%
# extract response (drop errors)
mutate(response = map(response, 'result'),
# where there is a response, extract its data
data = map_if(response, negate(is.null), httr::content))
df
#> # A tibble: 31 x 3
#> cnpj_vencedor response data
#> <chr> <list> <list>
#> 1 05231992000181 <S3: response> <tibble [49 × 18]>
#> 2 00628708000191 <S3: response> <NULL>
#> 3 05816554000185 <S3: response> <tibble [1 × 18]>
#> 4 01309949000130 <S3: response> <NULL>
#> 5 07098414000144 <NULL> <NULL>
#> 6 07299568000102 <NULL> <NULL>
#> 7 12665438000178 <NULL> <NULL>
#> 8 63599658000181 <NULL> <NULL>
#> 9 12755123000111 <NULL> <NULL>
#> 10 12376766000154 <NULL> <NULL>
#> # ... with 21 more rows
这篇关于当download.file花费的时间太长时,跳至下一个迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!