R:抓取网站,在网址中按日期递增循环,保存到CSV [英] R: Scraping Site, Incrementing Loop by Date in URL, Saving To CSV

查看:143
本文介绍了R:抓取网站,在网址中按日期递增循环,保存到CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对R和网络抓取相对陌生,所以对任何固有的明显错误道歉。



我想要从网址1中删除CSV文件,按日期增加到网址2,然后保存每个CSV文件。

  startdate<  -  as.Date(2007-07-01)
enddate< - as.Date(Sys.Date())

for(startdate in enddate){//遍历每个URL的日期
read.csv(url(http://api.foo.com/charts/data?output=csv& ; data = close& startdate =,startdate,& enddate =,startdate,& exchanges = bpi& dev = 1))
startdate = startdate + 1
startdate < startdate [-c(1441,1442),] //不相关的问题。移除自动插入CSV中的不需要的信息。
write.csv(startdate [-c(1441,1442),],startdate,'csv',row.names = FALSE)
}
pre>

正在输出以下错误:

  read.csv (url(http://api.foo.com/charts/data?output=csv&data=close&startdate=,startdate,& enddate =,startdate,& exchanges = bpi& dev = 1))
// match.arg中的错误(方法,c(default,internal,libcurl,wininet)):'arg'应该是 internal,libcurl,wininet

和:

  write.csv(startdate [c(1441,1442),],startdate,'csv',row.names = FALSE)
//错误charToDate(x):字符串不是标准的无歧义格式

错误?

解决方案

根据您的目标我正在从网址1中删除CSV文件,日期,然后保存每个CSV文件。这里是一个示例代码:

  startdate<  -  as.Date(2016-01-01)
enddate< - as.Date(Sys.Date())

geturl < - function(sdt,edt){
paste0(http://api.foo.com/ chart / data?output = csv& data = close,
& startdate =,sdt,& enddate =,edt,& exchanges = bpi& dev = 1)
} #geturl

dir.create(data)
garbage< - lapply(seq.Date(startdate,enddate,by =1 day),function {
dt< - as.Date(dt)
dat< - read.csv(url(geturl(dt,dt)))
write.csv(dat,paste0 data / dat - ,format(dt,%Y%m%d),。csv),row.names = FALSE)
})

这是您要找的?
你能提供一个示例链接吗?和一些样本日期?


I'm relatively new to R and web scraping, so apologies for any inherently obvious mistakes.

I'm looking to scrape a CSV file off URL 1, increment by date to URL 2, then save each CSV file.

startdate <- as.Date("2007-07-01")
enddate <- as.Date(Sys.Date())

for(startdate in enddate){ // Loop through dates on each URL 
    read.csv(url("http://api.foo.com/charts/data?output=csv&data=close&startdate=",startdate,"&enddate=",startdate,"&exchanges=bpi&dev=1"))
    startdate = startdate + 1
    startdate <- startdate[-c(1441,1442),] // Irrelevant to question at hand. Removes unwanted information auto-inserted into CSV. 
    write.csv(startdate[-c(1441,1442),], startdate, 'csv', row.names = FALSE)
}

The following errors are being outputted:

read.csv(url("http://api.foo.com/charts/data?output=csv&data=close&startdate=",startdate,"&enddate=",startdate,"&exchanges=bpi&dev=1"))
// Error in match.arg(method, c("default", "internal", "libcurl", "wininet")) :'arg' should be one of "default", "internal", "libcurl", "wininet"

and:

write.csv(startdate[c(1441,1442),], startdate, 'csv', row.names = FALSE)
//Error in charToDate(x) : character string is not in a standard unambiguous format

Any suggestions on how to fix these errors?

解决方案

based on your objective "I'm looking to scrape a CSV file off URL 1, increment to URL 2 by date, then save each CSV file." here is an example code:

startdate <- as.Date("2016-01-01")
enddate <- as.Date(Sys.Date())

geturl <- function(sdt, edt) {
    paste0("http://api.foo.com/charts/data?output=csv&data=close",
        "&startdate=",sdt,"&enddate=",edt,"&exchanges=bpi&dev=1")
} #geturl

dir.create("data")
garbage <- lapply(seq.Date(startdate, enddate, by="1 day"), function(dt) {
    dt <- as.Date(dt)
    dat <- read.csv(url(geturl(dt, dt)))
    write.csv(dat, paste0("data/dat-",format(dt, "%Y%m%d"),".csv"), row.names=FALSE)
})

is this what you are looking for? can you provide a sample link? and some sample dates?

这篇关于R:抓取网站,在网址中按日期递增循环,保存到CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆