R:解压缩大型压缩 .csv 会产生“zip 文件已损坏";警告 [英] R: unzipping large compressed .csv yields "zip file is corrupt" warning

查看:53
本文介绍了R:解压缩大型压缩 .csv 会产生“zip 文件已损坏";警告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从联合国粮农组织下载一个 78MB 的 zip 文件,其中包含一个 2.66GB 的 csv.我可以使用 winzip 从文件夹中解压缩下载的文件,但无法使用 R 中的 unzip() 解压缩文件:

I am downloading a 78MB zip file from the UN FAO, which contains a 2.66GB csv. I am able to unzip the the downloaded file from a folder using winzip, but have been unable to unzip the file using unzip() in R:

警告 - 下载 78MB!

url <- "http://fenixservices.fao.org/faostat/static/bulkdownloads/FoodBalanceSheets_E_All_Data_(Normalized).zip"
path <- file.path(getwd(),"/zipped_data.zip")
download.file(url, path, mode = "wb")
unzipped_data <- unzip(path)

这会导致警告和解压文件失败:

This results in a warning and a failure to unzip the file:

警告信息

在 unzip(path) 中:zip 文件已损坏

In unzip(path) : zip file is corrupt

?unzip 文档中我看到

"它确实支持 bzip2 压缩和 >2GB zip 文件(但不是 >= zip 文件中包含的 4GB 预压缩文件:像许多解压缩版本一样,它可能会截断这些文件,在 R 的情况下,如果可能会发出警告)"

"It does have some support for bzip2 compression and > 2GB zip files (but not >= 4GB files pre-compression contained in a zip file: like many builds of unzip it may truncate these, in R's case with a warning if possible)"

这让我相信 unzip() 应该处理我的文件,但同样的过程已经成功地从 FAOstat 下载、解压缩和读取了多个其他较小的表.我的 csv 的大小是否有可能是此错误的根源?如果是这样,解决方法是什么?

This makes me believe that unzip() should handle my file, but this same process has successfully downloaded, unzipped, and read multiple other smaller tables from the FAOstat. Is there a chance that the size of my csv is the source of this error? If so, what is the workaround?

推荐答案

我无法测试我的解决方案,这也取决于您的安装,但希望这会起作用或至少为您指明合适的解决方案:

I can't test my solution and it also depends on your installation but hopefully that'll work or at least point you to a suitable solution:

可以通过命令行运行winzip,这个页面显示了调用的结构

You can run winzip through command line, this page shows the structure of the call

并且您还可以使用 systemshell(它只是 system

And you can also run command lines from R, with system or shell (which is just a wrapper for system

要提取的命令行一般结构为:

The command line general structure to extract would be:

winzip32 -e [options] filename[.zip] folder

因此,我们使用此结构和您的输入路径创建了一个字符串,并围绕它创建了一个函数,该函数模仿带有参数 zipfileexdirunzip代码>

So we create a string with this structure and your input paths, and we create a function around it that mimics unzip with parameters zipfile and exdir

unzip_wz <- function(zipfile,exdir){
  dir.create(exdir,recursive = FALSE,showWarnings=FALSE) # I don't know how/if unzip creates folders, you might want to tweak or remove this line altogether
  str1 <- sprintf("winzip32 -e '%s' '%s'",zipfile,exdir)
  shell(str1,wait = TRUE)  # set to FALSE if you want the program to keep running while unzipping, proceed with caution but in some cases that could be an improvement of your current solution
}

您可以尝试使用此功能代替unzip.它假定 winzip32 已添加到您的系统路径变量中,如果没有,请添加它,或将其替换为 exec 全名,这样您就可以得到如下内容:

You can try to use this function in place of unzip. It assumes that winzip32 was added to your system path variables, if it isn't, either add it, or replace it by the exec full name so you have something like:

str1 <- sprintf("'C://probably/somewhere/in/program/files/winzip32.exe' -e '%s' '%s'",zipfile,exdir)

PS:使用完整路径!命令行不知道您的工作目录(如果需要,我们可以在我们的函数中实现该功能).

PS: use full paths! the command line doesn't know your working directories (we could implement the feature in our function if needed).

这篇关于R:解压缩大型压缩 .csv 会产生“zip 文件已损坏";警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆