R:将JSON通用扁平化为data.frame [英] R: Generic flattening of JSON to data.frame

查看:723
本文介绍了R:将JSON通用扁平化为data.frame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是关于将非循环同构或异构数据结构的任何集合转换为数据帧的通用机制。这在处理许多JSON文档的获取或者是一个字典数组的大型JSON文档时特别有用。



有几个SO问题处理使用 plyr lapply 等功能操纵深层嵌套的JSON结构并将其转换为数据框架。所有问题和答案我发现有关具体情况,而不是提供一个通用的方法来处理复杂的JSON数据结构的集合。



在Python和Ruby中,我通过实现一个通用的数据结构平展实用程序,使用数据结构中的叶节点的路径作为名称在扁平数据结构中该节点的值。例如, my_data [['x']] [[2]] [['y']] 会显示为 result [[ 'x.2.y']]



如果有一个这些数据结构的集合可能不完全相同,那么对数据帧进行成功整平的关键是发现所有可能的数据帧的名称列,例如,通过在单独展平的数据结构中取值的所有键/名称的并集。



这似乎是一个常见的模式,所以我想知道是否有人已经为R构建了这个框架。如果不是,我将构建它,但是,由于R的独特的基于promise的数据结构,我希望建议一个实现方法,最小化堆thrashing。

解决方案

Hi @Sim我有理由反思你昨天的问题define:

 code> flatten< -function(x){
dumnames< -unlist(getnames(x,T))
dumnames< ;-gsub((*。)\\.1 ,\\1,dumnames)
repeat {
x< - do.call(.Primitive(c),x)
if(!any(vapply ,is.list,logical(1)))){
names(x)< -dumnames
return(x)
}
}
}
getnames< -function(x,recursive){

nametree< - function(x,parent_name,depth){
if(length(x)== 0)
return(character(0))
x_names < - names(x)
if(is.null(x_names)){
x_names< - seq_along(x)
x_names < - paste(parent_name,x_names,sep =)
} else {
x_names [x_names ==]< - seq_along(x)[x_names ==]
x_names< - paste(parent_name,x_names,sep =)
}
if(!is.list(x)|| (!recursive&& depth> = 1L))
return(x_names)
x_names< - paste(x_names,。,sep =)
lapply (x,x,x,x,x,x,x,x,y,x) 0L)
}

getnames 改编自AnnotationDbi ::: make.name.tree)



flatten a href =http://stackoverflow.com/questions/8139677/how-to-flatten-a-list-to-a-list-without-coercion>如何在不强制的情况下将列表展平为列表?)



作为一个简单的例子

  my_data< (x = list(1,list(1,2,y ='e'),3))

> my_data [['x']] [[2]] [['y']]
[1]e

> out< -flatten(my_data)
> out
$ x.1
[1] 1

$ x.2.1
[1] 1

$ x.2.2
[1] 2

$ x.2.y
[1]e

$ x.3
[1] 3

> out [['x.2.y']]
[1]e

所以结果是一个扁平的列表,大致是你建议的命名结构。



一个更复杂的例子

  library(RJSONIO)
库(RCurl)
json.data< -getURL(http://www.reddit.com/r/leagueoflegends/.json)
dumdata< -fromJSON(json.data)
out< -flatten(dumdata)

UPDATE

天真的方式删除尾.1。

  my_data< = list(1,list(1,2,y ='e'),3))
gsub((*。)\\.1,\\1,unlist getnames(my_data,T)))

> gsub((*。)\\.1,\\1,unlist(getnames(my_data,T)))
[1]x.1 x.2.2x.2.yx.3


This question is about a generic mechanism for converting any collection of non-cyclical homogeneous or heterogeneous data structures into a dataframe. This can be particularly useful when dealing with the ingestion of many JSON documents or with a large JSON document that is an array of dictionaries.

There are several SO questions that deal with manipulating deeply nested JSON structures and turning them into dataframes using functionality such as plyr, lapply, etc. All the questions and answers I have found are about specific cases as opposed to offering a general approach for dealing with collections of complex JSON data structures.

In Python and Ruby I've been well-served by implementing a generic data structure flattening utility that uses the path to a leaf node in a data structure as the name of the value at that node in the flattened data structure. For example, the value my_data[['x']][[2]][['y']] would appear as result[['x.2.y']].

If one has a collection of these data structures that may not be entirely homogeneous the key to doing a successful flattening to a dataframe would be to discover the names of all possible dataframe columns, e.g., by taking the union of all keys/names of the values in the individually flattened data structures.

This seems like a common pattern and so I'm wondering whether someone has already built this for R. If not, I'll build it but, given R's unique promise-based data structures, I'd appreciate advice on an implementation approach that minimizes heap thrashing.

解决方案

Hi @Sim I had cause to reflect on your problem yesterday define:

flatten<-function(x) {
    dumnames<-unlist(getnames(x,T))
    dumnames<-gsub("(*.)\\.1","\\1",dumnames)
    repeat {
        x <- do.call(.Primitive("c"), x)
        if(!any(vapply(x, is.list, logical(1)))){
           names(x)<-dumnames
           return(x)
        }
    }
}
getnames<-function(x,recursive){

    nametree <- function(x, parent_name, depth) {
        if (length(x) == 0) 
            return(character(0))
        x_names <- names(x)
        if (is.null(x_names)){ 
            x_names <- seq_along(x)
            x_names <- paste(parent_name, x_names, sep = "")
        }else{ 
            x_names[x_names==""] <- seq_along(x)[x_names==""]
            x_names <- paste(parent_name, x_names, sep = "")
        }
        if (!is.list(x) || (!recursive && depth >= 1L)) 
            return(x_names)
        x_names <- paste(x_names, ".", sep = "")
        lapply(seq_len(length(x)), function(i) nametree(x[[i]], 
            x_names[i], depth + 1L))
    }
    nametree(x, "", 0L)
}

(getnames is adapted from AnnotationDbi:::make.name.tree)

(flatten is adapted from discussion here How to flatten a list to a list without coercion?)

as a simple example

my_data<-list(x=list(1,list(1,2,y='e'),3))

> my_data[['x']][[2]][['y']]
[1] "e"

> out<-flatten(my_data)
> out
$x.1
[1] 1

$x.2.1
[1] 1

$x.2.2
[1] 2

$x.2.y
[1] "e"

$x.3
[1] 3

> out[['x.2.y']]
[1] "e"

so the result is a flattened list with roughly the naming structure you suggest. Coercion is avoided also which is a plus.

A more complicated example

library(RJSONIO)
library(RCurl)
json.data<-getURL("http://www.reddit.com/r/leagueoflegends/.json")
dumdata<-fromJSON(json.data)
out<-flatten(dumdata)

UPDATE

naive way to remove trailing .1

my_data<-list(x=list(1,list(1,2,y='e'),3))
gsub("(*.)\\.1","\\1",unlist(getnames(my_data,T)))

> gsub("(*.)\\.1","\\1",unlist(getnames(my_data,T)))
[1] "x.1"   "x.2.1" "x.2.2" "x.2.y" "x.3"  

这篇关于R:将JSON通用扁平化为data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆