应用下载压缩文件并删除特定文件的功能 [英] Apply function that downloads zip files and deletes specific files

查看:134
本文介绍了应用下载压缩文件并删除特定文件的功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个函数,并使用 apply 来为我的数据集中的每一行调用它。该数据集包含zip文件的URL,这些URL将被下载,解压缩,并且在解压缩之后,TXT和zip文件将从工作目录中删除。

I am trying to write a function and call it using apply to each row in my dataset. The dataset contains URLs of zip files, which will be downloaded, unzipped, and after unzipping TXT and zip files will be deleted from the working directory.

head(data)
                                                 data                                                                   URL
1 /files/market_valuation/ru/2017/val170502170509.zip http://www.kase.kz/files/market_valuation/ru/2017/val170502170509.zip
2 /files/market_valuation/ru/2017/val170424170430.zip http://www.kase.kz/files/market_valuation/ru/2017/val170424170430.zip
3 /files/market_valuation/ru/2017/val170417170423.zip http://www.kase.kz/files/market_valuation/ru/2017/val170417170423.zip
4 /files/market_valuation/ru/2017/val170410170416.zip http://www.kase.kz/files/market_valuation/ru/2017/val170410170416.zip
5 /files/market_valuation/ru/2017/val170403170409.zip http://www.kase.kz/files/market_valuation/ru/2017/val170403170409.zip
6 /files/market_valuation/ru/2017/val170327170402.zip http://www.kase.kz/files/market_valuation/ru/2017/val170327170402.zip

$ b $
$ b

My function:

Price_KASE <- function(data){
    URL = data[,2]
    dir = basename(URL)
    download.file(URL, dir)
    unzip(dir)
    TXT <- list.files(pattern = "*.TXT")
    zip <- list.files(pattern = "*.zip")
    file.remove(TXT, zip)
}

    apply(data, 1, Price_KASE(data))

和错误信息:

Error in download.file(URL, dir) : 
  'url' must be a length-one character vector

请解释我的代码出了什么问题,我该如何解决它?
谢谢。

Please explain what is wrong with my code and how do I fix it? Thank you.

使用作为循环的替代方式:

for (i in 1:length(data[,2])){
    URL = data[i, 2]
    dir = basename(URL)
    download.file(URL, dir)
    unzip(dir)
    TXT <- list.files(pattern = "*.TXT")
    zip <- list.files(pattern = "*.zip")
    file.remove(TXT, zip)
}

它似乎工作正常,但在第4或第5个文件后,我得到在download.file(URL,dir)中:
无法打开URL'http:/ /www.kase.kz/files/market_valuation/ru/2017/val170410170416.zip':HTTP状态为'503服务暂时不可用'

It seems to work OK, but after 4th or 5th file I get In download.file(URL, dir) : cannot open URL 'http://www.kase.kz/files/market_valuation/ru/2017/val170410170416.zip': HTTP status was '503 Service Temporarily Unavailable'

推荐答案

我认为在您的数据框中,您的网址存储为因子变量。尝试使用:

I think that in your data frame your URLs are stored as factor variables. try using:

data[,2] <- as.character(data[,2])

如果您将此文件读作.csv或构建数据框,请考虑设置stringsAsFactors = FALSE。

if you are reading this as .csv or constructing the data frame, consider setting stringsAsFactors = FALSE.

UPDATE:

我注意到了一些当您尝试在apply中使用1时,排成一个单独的矢量。所以你也必须改变你的功能。请参阅下面的粗体部分。

I noticed something when you try to use 1 in apply, it takes all of the lines a single vector. So you also have to change your function. Please see bold section below. This code runs completely in the example below giving the output.

data1 <- data.frame(a = "/files/market_valuation/ru/2017/val170502170509.zip",
                b = "http://www.kase.kz/files/market_valuation/ru/2017/val170502170509.zip")


Price_KASE <- function(data){
  **URL = data[2]**
  dir = basename(URL)
  download.file(URL, dir)
  unzip(dir)
  TXT <- list.files(pattern = "*.TXT")
  zip <- list.files(pattern = "*.zip")
  file.remove(TXT, zip)
}

data1$b <- as.character(data1$b)

apply(data1, 1, Price_KASE)

#     [,1]
#[1,] TRUE
#[2,] TRUE

这篇关于应用下载压缩文件并删除特定文件的功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆