使用 R 下载压缩数据文件、提取和导入 .csv [英] Using R to download zipped data file, extract, and import .csv
问题描述
我正在尝试使用 R 从网页下载并提取 .csv 文件.
I am trying to download and extract a .csv file from a webpage using R.
此问题与 使用R 下载压缩数据文件,提取和导入数据.
我无法使解决方案起作用,但这可能是由于我使用的网址造成的.
I cannot get the solution to work, but it may be due to the web address i am using.
我正在尝试从 http://data.worldbank.org/下载 .csv 文件国家/地区/英国(在下载数据下拉菜单下)
I am trying to download the .csv files from http://data.worldbank.org/country/united-kingdom (under the download data drop down)
使用上面链接中@Dirk 的解决方案,我试过了
Using @Dirk's solution from the link above, i tried
temp <- tempfile()
download.file("http://api.worldbank.org/v2/en/country/gbr?downloadformat=csv",temp)
con <- unz(temp, "gbr_Country_en_csv_v2.csv")
dat <- read.table(con, header=T, skip=2)
unlink(temp)
我通过查看页面源代码获得了扩展链接,我预计这会导致问题,尽管如果我将其粘贴到地址栏中,它仍然有效.
I got the extended link by looking at the page source code, which I expect is causing the problems, although it works if i paste it into the address bar.
文件以正确的 Gb 下载
The file downloads with the correct Gb
download.file("http://api.worldbank.org/v2/en/country/gbr?downloadformat=csv",temp)
# trying URL 'http://api.worldbank.org/v2/en/country/gbr?downloadformat=csv'
# Content type 'application/zip' length 332358 bytes (324 Kb)
# opened URL
# downloaded 324 Kb
# also tried unzip but get this warning
con <- unzip(temp, "gbr_Country_en_csv_v2.csv")
# Warning message:
# In unzip(temp, "gbr_Country_en_csv_v2.csv") :
# requested file not found in the zip file
但这些是我手动下载时的文件名.
But these are the file names when i manually download them.
如果我出错了,我将不胜感激,谢谢
I'd appreciate some help with where i am going wrong , thanks
我使用的是 Windows 8,R 版本 3.1.0
I am using Windows 8, R version 3.1.0
推荐答案
为了让你的数据下载和解压,你需要设置mode="wb"
In order to get your data to download and uncompress, you need to set mode="wb"
download.file("...",temp, mode="wb")
unzip(temp, "gbr_Country_en_csv_v2.csv")
dd <- read.table("gbr_Country_en_csv_v2.csv", sep=",",skip=2, header=T)
看起来默认是w",它假设一个文本文件.如果它是一个普通的 csv 文件,那就没问题了.但由于它是压缩的,它是一个二进制文件,因此是wb".没有wb"部分,您根本无法打开压缩包.
It looks like the default is "w" which assumes a text files. If it was a plain csv file this would be fine. But since it's compressed, it's a binary file, hence the "wb". Without the "wb" part, you can't open the zip at all.
这篇关于使用 R 下载压缩数据文件、提取和导入 .csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!