应用下载 zip 文件和删除特定文件的功能 [英] Apply function that downloads zip files and deletes specific files

查看:18
本文介绍了应用下载 zip 文件和删除特定文件的功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个函数并使用 apply 调用它到我数据集中的每一行.数据集包含zip文件的URL,将下载、解压,解压后TXT和zip文件将从工作目录中删除.

I am trying to write a function and call it using apply to each row in my dataset. The dataset contains URLs of zip files, which will be downloaded, unzipped, and after unzipping TXT and zip files will be deleted from the working directory.

head(data)
                                                 data                                                                   URL
1 /files/market_valuation/ru/2017/val170502170509.zip http://www.kase.kz/files/market_valuation/ru/2017/val170502170509.zip
2 /files/market_valuation/ru/2017/val170424170430.zip http://www.kase.kz/files/market_valuation/ru/2017/val170424170430.zip
3 /files/market_valuation/ru/2017/val170417170423.zip http://www.kase.kz/files/market_valuation/ru/2017/val170417170423.zip
4 /files/market_valuation/ru/2017/val170410170416.zip http://www.kase.kz/files/market_valuation/ru/2017/val170410170416.zip
5 /files/market_valuation/ru/2017/val170403170409.zip http://www.kase.kz/files/market_valuation/ru/2017/val170403170409.zip
6 /files/market_valuation/ru/2017/val170327170402.zip http://www.kase.kz/files/market_valuation/ru/2017/val170327170402.zip

我的功能:

Price_KASE <- function(data){
    URL = data[,2]
    dir = basename(URL)
    download.file(URL, dir)
    unzip(dir)
    TXT <- list.files(pattern = "*.TXT")
    zip <- list.files(pattern = "*.zip")
    file.remove(TXT, zip)
}

    apply(data, 1, Price_KASE(data))

和错误信息:

Error in download.file(URL, dir) : 
  'url' must be a length-one character vector

请解释我的代码有什么问题,我该如何解决?谢谢.

Please explain what is wrong with my code and how do I fix it? Thank you.

使用 for 循环的替代方法:

Alternative way using for loop:

for (i in 1:length(data[,2])){
    URL = data[i, 2]
    dir = basename(URL)
    download.file(URL, dir)
    unzip(dir)
    TXT <- list.files(pattern = "*.TXT")
    zip <- list.files(pattern = "*.zip")
    file.remove(TXT, zip)
}

它似乎工作正常,但是在第 4 个或第 5 个文件之后我得到 In download.file(URL, dir) :无法打开网址http://www.kase.kz/files/market_valuation/ru/2017/val170410170416.zip":HTTP 状态为503 服务暂时不可用"

It seems to work OK, but after 4th or 5th file I get In download.file(URL, dir) : cannot open URL 'http://www.kase.kz/files/market_valuation/ru/2017/val170410170416.zip': HTTP status was '503 Service Temporarily Unavailable'

推荐答案

我认为在您的数据框中,您的 URL 存储为因子变量.尝试使用:

I think that in your data frame your URLs are stored as factor variables. try using:

data[,2] <- as.character(data[,2])

如果您将其作为 .csv 阅读或构建数据框,请考虑设置 stringsAsFactors = FALSE.

if you are reading this as .csv or constructing the data frame, consider setting stringsAsFactors = FALSE.

更新:

当您尝试在 apply 中使用 1 时,我注意到一些事情,它将所有行作为一个向量.所以你也必须改变你的功能.请参阅下面的粗体部分.这段代码在下面给出输出的示例中完全运行.

I noticed something when you try to use 1 in apply, it takes all of the lines a single vector. So you also have to change your function. Please see bold section below. This code runs completely in the example below giving the output.

data1 <- data.frame(a = "/files/market_valuation/ru/2017/val170502170509.zip",
                b = "http://www.kase.kz/files/market_valuation/ru/2017/val170502170509.zip")


Price_KASE <- function(data){
  **URL = data[2]**
  dir = basename(URL)
  download.file(URL, dir)
  unzip(dir)
  TXT <- list.files(pattern = "*.TXT")
  zip <- list.files(pattern = "*.zip")
  file.remove(TXT, zip)
}

data1$b <- as.character(data1$b)

apply(data1, 1, Price_KASE)

#     [,1]
#[1,] TRUE
#[2,] TRUE

这篇关于应用下载 zip 文件和删除特定文件的功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆