跟踪“使用R从API提取数据"的后续步骤. [英] A follow-up to "Extracting data from an API using R"

查看:97
本文介绍了跟踪“使用R从API提取数据"的后续步骤.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我拥有的代码(来自此处继续......使用R 从API中提取数据会产生非常复杂的输出.除了嵌套在列表中的data.frame,我几乎可以提取所有我需要的东西.

The code I have (which comes from here A continuation of... Extracting data from an API using R) gives a very complicated output. I can extract almost all I need except for a data.frame that's nested within the list.

不做任何事情,它给了我这个错误:

Without doing anything, it gives me this error:

.rowNamesDF<-中的错误(x,值=值): 不允许重复的"row.names" 另外:警告消息: 设置'row.names'时的非唯一值:'1','10','11','12','13','14','15','16','17','18 ","19","2","20","3","4","5","6","7","8","9"

Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘1’, ‘10’, ‘11’, ‘12’, ‘13’, ‘14’, ‘15’, ‘16’, ‘17’, ‘18’, ‘19’, ‘2’, ‘20’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’

如果我尝试展平或取消列出,则显示为NULL.

If I try to flatten or unlist, it comes up NULL.

在示例代码中,我添加了一些易于获取的变量,数字42是"dokintressent",从中需要"intressent",这是每种情况的名称列表.我必须从瑞典立法机关运行API六次,但这是比较棘手的事情.

In the example code, I've added some variables that are easy to get and number 42 is "dokintressent", from which I need "intressent", a list of names for each case. I have to run APIs from the Swedish legislative a half a dozen times, but this is the trickier one.

当我删除42时,它使data.frame变得完美.

When I remove 42, it makes the data.frame perfectly.

my_dfs1 <- lapply(1:207, function(i){
  my_url <- paste0("http://data.riksdagen.se/dokumentlista/?sok=&doktyp=mot&rm=&from=2017-01-01&tom=2017-12-31&ts=&bet=&tempbet=&nr=&org=&iid=&webbtv=&talare=&exakt=&planering=&sort=rel&sortorder=desc&rapport=&utformat=json&a=s&p=", i)
  r1 <- GET(my_url)
  r2 <- rawToChar(r1$content)
  r3 <- fromJSON(r2)
  r4 <- r3$dokumentlista$dokument
  return(r4)
})

df <- my_dfs1 %>% lapply(function(df_0){
  df_0[c(12:14, 18, 42)]
}) %>% do.call(rbind, .)

我注意到我想要的数据实际上是每种情况下几个data.frames.从即时"开始,我需要"namn".基本上,我需要最终数据库如下所示:

I've noticed that the data I want is actually several data.frames per case. From "intressent", I need "namn". Basically, I need the final database to look like this:

                     V12     V13    V14    V18    Namn
    Motion 1                                     c(name1, name2)

推荐答案

您需要自行处理intressent并从中提取所需内容,然后将其分配给新列,只需确保获得一个每行简单的数据结构.

you need to work on intressent on its own and extract from it what you need and then assign it to a new column, just make sure you get a simple data structure per row.

如果还可以,可以将名称粘贴在一起,例如用'-'分隔,然后intressent将是一个简单的字符向量.

You can also, if it works better for you, paste the names together, separated by '-', for example, and then intressent will be a simple character vector.

df <- my_dfs1 %>% lapply(function(df_0){
  #choose the columns you want
  return_df <- df_0[c(12:14, 18)]
  # work on intressent
  return_df$namn <- df_0$dokintressent$intressent %>% 
    lapply(function(x)list(x$namn)) %>% 
    do.call(rbind, .)                    # careful here a simple unlist won't work
  return(return_df) }) %>% 
  do.call(rbind, .)

这篇关于跟踪“使用R从API提取数据"的后续步骤.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆