继续...使用R从API提取数据 [英] A continuation of... Extracting data from an API using R
问题描述
我是一个超级新手,正在为R撰写论文.这个答案中的代码终于对我有用(使用R从API提取数据),但我不知道如何向其中添加循环.当我需要全部3360时,我会不断获得API的首页. 这是代码:
I'm a super new at this and working on R for my thesis. The code in this answer finally worked for me (Extracting data from an API using R), but I can't figure out how to add a loop to it. I keep getting the first page of the API when I need all 3360. Here's the code:
library(httr)
library(jsonlite)
r1 <- GET("http://data.riksdagen.se/dokumentlista/?
sok=&doktyp=mot&rm=&from=2000-01-01&tom=2017-12- 31&ts=&bet=&tempbet=&nr=&org=&iid=&webbtv=&talare=&exakt=&planering=&sort=rel&sortorder=desc&rapport=&utformat=json&a=s#soktraff")
r2 <- rawToChar(r1$content)
class(r2)
r3 <- fromJSON(r2)
r4 <- r3$dokumentlista$dokument
到我到达r4时,它已经是一个数据帧.
By the time I reach r4, it's already a data frame.
请谢谢!
最初,我无法获得将页面作为信息包含在其中的URL.现在我有(下).我仍然无法循环播放. " http://data.riksdagen.se/dokumentlista/?sok=&doktyp=mot&rm=&from=2000-01-01& tom = 2017-12-31& ts =& bet =& tempbet =& nr =& org =& iid =& webbtv =& talare =& exakt =& planering =& sort = rel& sortorder = desc& rapport =& utformat = json& a = s& p = "
originally, I couldn't get a url that had the page as info within it. Now I have it (below). I still haven't been able to loop it. "http://data.riksdagen.se/dokumentlista/?sok=&doktyp=mot&rm=&from=2000-01-01&tom=2017-12-31&ts=&bet=&tempbet=&nr=&org=&iid=&webbtv=&talare=&exakt=&planering=&sort=rel&sortorder=desc&rapport=&utformat=json&a=s&p="
推荐答案
我认为您可以从r3
中提取下一页的网址,如下所示:
I think you can extract the url of the next page from r3
as follows:
next_url <- r3$dokumentlista$`@nasta_sida`
# you need to re-check this, but sometimes I'm getting white spaces within the url,
# you may not face this problem, but in any case this line of code solved the issue
next_url <- gsub(' ', '', n_url)
GET(next_url)
更新
我尝试使用10页的页码访问url,
I tried the url with the page number with 10 pages and it worked
my_dfs <- lapply(1:10, function(i){
my_url <- paste0("http://data.riksdagen.se/dokumentlista/?sok=&doktyp=mot&rm=&from=2000-01-01&tom=2017-12-31&ts=&bet=&tempbet=&nr=&org=&iid=&webbtv=&talare=&exakt=&planering=&sort=rel&sortorder=desc&rapport=&utformat=json&a=s&p=", i)
r1 <- GET(my_url)
r2 <- rawToChar(r1$content)
r3 <- fromJSON(r2)
r4 <- r3$dokumentlista$dokument
return(r4)
})
更新2:
提取的数据帧很复杂(例如,某些列是数据帧的列表),这就是为什么简单的rbind
在这里不起作用的原因,在将数据堆叠在一起之前,您必须进行一些预处理,这样的事情会起作用
The extracted data frames are complex (e.g. some columns are lists of data frames) which is why a simple rbind
will not work here, you'll have to do some pre-processing before you stack up the data together, something like this would work
my_dfs %>% lapply(function(df_0){
# Do some stuff here with the data, and choose the variables you need
# I chose the first 10 columns to check that I got 200 different observations
df_0[1:10]
}) %>% do.call(rbind, .)
这篇关于继续...使用R从API提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!