当download.file花费的时间太长时,跳至下一个迭代 [英] Skip to the next iteration when download.file takes too long

查看:67
本文介绍了当download.file花费的时间太长时,跳至下一个迭代的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试跳过耗时太长的download.file迭代,尽管我尝试了一些类似的

I have been trying to skip a iteration of download.file that takes too long and it is not working accordingly, though I've tried some similar answers to my problem. I've set an example below with the code that I have been using. My main problem is that some of the IDs (from the vec object below) that I am using to extract the .csv files do not possess a related .csv file and the URL does not work adequately -- I believe it keeps trying the URL until it gets a response, which it does not, and the loop starts taking too long. How can I skip an ID if download.file starts taking too long?

library(stringr)
library(R.utils)    

vec=c("05231992000181","00628708000191","05816554000185", "01309949000130","07098414000144", "07299568000102", "12665438000178", "63599658000181", "12755123000111", "12376766000154",
      "11890564000163", "04401095000106", "11543768000128", "10695634000160", "34931022000197", "10422225000190",
      "09478854000152", "12682106000100", "11581441000140", "10545688000149", "10875891000183", "13095498000165",
      "10809607000170", "07976466000176", "11422211000139", "41205907000174", "08326720000153", "06910908000119",
      "04196935000227", "02323120000155", "96560701000154")


for (i in seq_along(vec)) {

  url = paste0("http://compras.dados.gov.br/licitacoes/v1/licitacoes.csv?cnpj_vencedor=", vec[i])

  tryCatch(expr = {evalWithTimeout(download.file(url, 
                                                 destfile = paste0("C:/Users/Username/Desktop/example_file/",vec[i],".csv"),  
                                                 mode="wb"), timeout=3)},
           error=function(ex) cat("Timeout. Skipping.\n"))

  print(i)
}

推荐答案

在可能的情况下,HTTP状态是一种处理这种情况的有效方法,但是如果服务器没有响应,则可以使用httr::timeout设置超时,传递给httr::GET.通过tidyverse将所有内容保存在整洁的数据框列表列中,

When possible, HTTP status is an efficient way to deal with this situation, but if the server is not responding, you can set a timeout with httr::timeout, passed to httr::GET. Keeping everything in neat data frame list columns via the tidyverse,

library(dplyr)
library(purrr)

base_url <- "http://compras.dados.gov.br/licitacoes/v1/licitacoes.csv?cnpj_vencedor="
df <- data_frame(cnpj_vencedor = c("05231992000181", "00628708000191", "05816554000185", "01309949000130","07098414000144", "07299568000102", "12665438000178", "63599658000181", "12755123000111", "12376766000154", "11890564000163", "04401095000106", "11543768000128", "10695634000160", "34931022000197", "10422225000190", "09478854000152", "12682106000100", "11581441000140", "10545688000149", "10875891000183", "13095498000165","10809607000170", "07976466000176", "11422211000139", "41205907000174", "08326720000153", "06910908000119", "04196935000227", "02323120000155", "96560701000154")) 

df <- df %>% 
    # iterate GET over URLs, modified by `purrr::safely` to return a list of 
    # the result and the error (NULL where appropriate), with timeout set
    mutate(response = map(paste0(base_url, cnpj_vencedor), 
                          safely(httr::GET), httr::timeout(3)))

df <- df %>% 
           # extract response (drop errors)
    mutate(response = map(response, 'result'),
           # where there is a response, extract its data 
           data = map_if(response, negate(is.null), httr::content))

df
#> # A tibble: 31 x 3
#>    cnpj_vencedor  response       data              
#>    <chr>          <list>         <list>            
#>  1 05231992000181 <S3: response> <tibble [49 × 18]>
#>  2 00628708000191 <S3: response> <NULL>            
#>  3 05816554000185 <S3: response> <tibble [1 × 18]> 
#>  4 01309949000130 <S3: response> <NULL>            
#>  5 07098414000144 <NULL>         <NULL>            
#>  6 07299568000102 <NULL>         <NULL>            
#>  7 12665438000178 <NULL>         <NULL>            
#>  8 63599658000181 <NULL>         <NULL>            
#>  9 12755123000111 <NULL>         <NULL>            
#> 10 12376766000154 <NULL>         <NULL>            
#> # ... with 21 more rows

这篇关于当download.file花费的时间太长时,跳至下一个迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆