R将JSON转换为列表到data.table [英] R convert json to list to data.table
问题描述
我有一个data.table,其中的一列包含JSON.我正在尝试提取内容,以便每个变量都是一列.
I have a data.table where one of the columns contains JSON. I am trying to extract the content so that each variable is a column.
library(jsonlite)
library(data.table)
df<-data.table(a=c('{"tag_id":"34","response_id":2}',
'{"tag_id":"4","response_id":1,"other":4}',
'{"tag_id":"34"}'),stringsAsFactors=F)
所需结果,该结果不涉及其他"变量:
The desired result, that does not refer to the "other" variable:
tag_id response_id
1 "34" 2
2 "4" 1
3 "34" NA
我尝试了以下几种版本:
I have tried several versions of:
parseLog <- function(x){
if (is.na(x))
e=c(tag_id=NA,response_id=NA)
else{
j=fromJSON(x)
e=c(tag_id=as.integer(j$tag_id),response_id=j$response_id)
}
e
}
似乎可以很好地检索向量列表(如果c被列表替换,则可以找到列表),但是当我尝试将列表转换为data.table时,某些功能无法正常工作.
that seems to work well to retrieve a list of vectors (or lists if c is replaced by list) but when I try to convert the list to data.table something doesn´t work as expected.
parsed<-lapply(df$a,parseLog)
rparsed<-do.call(rbind.data.frame,parsed)
colnames(rparsed)<-c("tag_id","response_id")
由于第三行中缺少值.我该如何以R-ish干净的方式解决它?如何使我的parse方法返回缺少值的NA.或者,是否存在可以在rbind.data.frame或类似方法中使用的rbind这样的参数"fill"?
Because of the missing value in the third row. How can I solve it in a R-ish clean way? How can I make that my parse method returns an NA for the missing value. Alternative, Is there a parameter "fill" like there is for rbind that can be used in rbind.data.frame or analogous method?
我正在使用的数据集有1100万行,因此性能非常重要.
The dataset I am using has 11M rows so performance is important.
此外,还有一种等效的方法来rbind.data.frame获得data.table.将如何使用?当我检查文档时,它指向我rbindlist,但它抱怨该参数未使用,并且如果直接调用(不这样做,则它抱怨所分析的类型):
Additionally, there is an equivalent method to rbind.data.frame to obtain a data.table. How would that be used? When I check the documentation it refers me to rbindlist but it complains the parameter is not used and if call directly(without do.call it complains about the type of parsed):
rparsed<-do.call(rbindlist,fill=T,parsed)
我需要介绍的情况更为笼统,在一组1100万条记录中,所有可能的情况都发生了:
The case I need to cover is more general, in a set of 11M records all the possible circumstances happen:
df<-data.table(a=c('{"tag_id":"34","response_id":2}',
'{"trash":"34","useless":2}',
'{"tag_id":"4","response_id":1,"other":4}',
NA,
'{"response_id":"34"}',
'{"tag_id":"34"}'),stringsAsFactors=F)
,并且输出应仅包含tag_id和response_id列.
and the output should only contain tag_id and response_id columns.
推荐答案
可能有一种更简单的方法,但这似乎可行:
There might be a simpler way but this seems to be working:
library(data.table)
library(jsonlite)
df[, json := sapply(a, fromJSON)][, rbindlist(lapply(json, data.frame), fill=TRUE)]
#or if you need all the columns :
#df[, json := sapply(a, fromJSON)][,
# c('tag_id', 'response_id') := rbindlist(lapply(json, data.frame), fill=TRUE)]
输出:
> df[, json := sapply(a, fromJSON)][, rbindlist(lapply(json, data.frame), fill=TRUE)]
tag_id response_id
1: 34 2
2: 4 1
3: 34 NA
此解决方案是在对问题进行了附加请求之后进行的.
This solution comes after the edit of the question with additional requests.
有很多方法可以做到这一点,但我发现最简单的方法是在创建data.frame时像这样:
There are lots of ways to do this but I find the simplest one is at the creation of the data.frame like this:
df[, json := sapply(a, fromJSON)][,
rbindlist(lapply(json, function(x) data.frame(x)[-3]), fill=TRUE)]
# tag_id response_id
#1: 34 2
#2: 4 1
#3: 34 NA
这篇关于R将JSON转换为列表到data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!