R将JSON转换为列表到data.table [英] R convert json to list to data.table

查看:102
本文介绍了R将JSON转换为列表到data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.table,其中的一列包含JSON.我正在尝试提取内容,以便每个变量都是一列.

I have a data.table where one of the columns contains JSON. I am trying to extract the content so that each variable is a column.

library(jsonlite)
library(data.table)

df<-data.table(a=c('{"tag_id":"34","response_id":2}',
                   '{"tag_id":"4","response_id":1,"other":4}',
                   '{"tag_id":"34"}'),stringsAsFactors=F)

所需结果,该结果不涉及其他"变量:

The desired result, that does not refer to the "other" variable:

   tag_id response_id 
1     "34"      2 
2     "4"       1 
3     "34"      NA

我尝试了以下几种版本:

I have tried several versions of:

parseLog <- function(x){
  if (is.na(x))
    e=c(tag_id=NA,response_id=NA)
  else{
    j=fromJSON(x)
    e=c(tag_id=as.integer(j$tag_id),response_id=j$response_id)
  }
  e
}

似乎可以很好地检索向量列表(如果c被列表替换,则可以找到列表),但是当我尝试将列表转换为data.table时,某些功能无法正常工作.

that seems to work well to retrieve a list of vectors (or lists if c is replaced by list) but when I try to convert the list to data.table something doesn´t work as expected.

   parsed<-lapply(df$a,parseLog)
   rparsed<-do.call(rbind.data.frame,parsed)
   colnames(rparsed)<-c("tag_id","response_id")

由于第三行中缺少值.我该如何以R-ish干净的方式解决它?如何使我的parse方法返回缺少值的NA.或者,是否存在可以在rbind.data.frame或类似方法中使用的rbind这样的参数"fill"?

Because of the missing value in the third row. How can I solve it in a R-ish clean way? How can I make that my parse method returns an NA for the missing value. Alternative, Is there a parameter "fill" like there is for rbind that can be used in rbind.data.frame or analogous method?

我正在使用的数据集有1100万行,因此性能非常重要.

The dataset I am using has 11M rows so performance is important.

此外,还有一种等效的方法来rbind.data.frame获得data.table.将如何使用?当我检查文档时,它指向我rbindlist,但它抱怨该参数未使用,并且如果直接调用(不这样做,则它抱怨所分析的类型):

Additionally, there is an equivalent method to rbind.data.frame to obtain a data.table. How would that be used? When I check the documentation it refers me to rbindlist but it complains the parameter is not used and if call directly(without do.call it complains about the type of parsed):

rparsed<-do.call(rbindlist,fill=T,parsed)

我需要介绍的情况更为笼统,在一组1100万条记录中,所有可能的情况都发生了:

The case I need to cover is more general, in a set of 11M records all the possible circumstances happen:

df<-data.table(a=c('{"tag_id":"34","response_id":2}',
                   '{"trash":"34","useless":2}',                          
                   '{"tag_id":"4","response_id":1,"other":4}',
                   NA,
                   '{"response_id":"34"}', 
                   '{"tag_id":"34"}'),stringsAsFactors=F)

,并且输出应仅包含tag_id和response_id列.

and the output should only contain tag_id and response_id columns.

推荐答案

可能有一种更简单的方法,但这似乎可行:

There might be a simpler way but this seems to be working:

library(data.table)
library(jsonlite)
df[, json := sapply(a, fromJSON)][, rbindlist(lapply(json, data.frame), fill=TRUE)]

#or if you need all the columns :
#df[, json := sapply(a, fromJSON)][,
#   c('tag_id', 'response_id') := rbindlist(lapply(json, data.frame), fill=TRUE)]

输出:

> df[, json := sapply(a, fromJSON)][, rbindlist(lapply(json, data.frame), fill=TRUE)]
   tag_id response_id
1:     34           2
2:      4           1
3:     34          NA

此解决方案是在对问题进行了附加请求之后进行的.

This solution comes after the edit of the question with additional requests.

有很多方法可以做到这一点,但我发现最简单的方法是在创建data.frame时像这样:

There are lots of ways to do this but I find the simplest one is at the creation of the data.frame like this:

df[, json := sapply(a, fromJSON)][, 
   rbindlist(lapply(json, function(x) data.frame(x)[-3]), fill=TRUE)]

#   tag_id response_id
#1:     34           2
#2:      4           1
#3:     34          NA

这篇关于R将JSON转换为列表到data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆