“嵌入" JSON中的data.frame [英] "embedded" data.frame from JSON

查看:85
本文介绍了“嵌入" JSON中的data.frame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码,该代码从JSON文件中提取数据.

I have the following code, which extracts data from a JSON file.

library(jsonlite)

file_path <- 'C:/some/file/path.json'

df <- jsonlite::fromJSON(txt = file_path , 
                        simplifyVector = FALSE,
                        simplifyDataFrame = TRUE,
                        simplifyMatrix = FALSE,
                        flatten = FALSE)

数据结构是高度嵌套的.我的方法很好地提取了其中的99%,但是在数据的特定部分中,我遇到了一种现象,我将其描述为嵌入式"数据帧:

The data structure is highly nested. My approach extracts 99% of it just fine, but in one particular part of the data I came across a phenomenon that I would describe as an "embedded" data frame:

df <- structure(
  list(
    ID = c(1L, 2L, 3L, 4L, 5L),
    var1 = c('a', 'b', 'c', 'd', 'e'),
    var2 = structure(
      list(
        var2a = c('v', 'w', 'x', 'y', 'z'),
        var2b = c('vv', 'ww', 'xx', 'yy', 'zz')),
      .Names = c('var2a', 'var2b'),
      row.names = c(NA, 5L),
      class = 'data.frame'),
    var3 = c('aa', 'bb', 'cc', 'dd', 'ee')),
  .Names = c('ID', 'var1', 'var2', 'var3'),
  row.names = c(NA, 5L),
  class = 'data.frame')

# Looks like this:
#   ID var1 var2.var2a var2.var2b var3
# 1  1    a          v         vv   aa
# 2  2    b          w         ww   bb
# 3  3    c          x         xx   cc
# 4  4    d          y         yy   dd
# 5  5    e          z         zz   ee

看起来像一个普通的数据框,并且在大多数情况下都表现得很像.

This looks like a normal data frame, and it behaves like that for the most part.

class(df)
# [1] "data.frame"

df[1,]
# ID var1 var2.var2a var2.var2b var3
# 1     a          v         vv   aa

dim(df)
# [1] 5 4
# One less than expected due to embedded data frame

lapply(df, class)
# $ID
# [1] "integer"
# 
# $var1
# [1] "character"
# 
# $var2
# [1] "data.frame"
# 
# $var3
# [1] "character"

str(df)
# 'data.frame': 5 obs. of  4 variables:
#   $ ID  : int  1 2 3 4 5
# $ var1: chr  "a" "b" "c" "d" ...
# $ var2:'data.frame':  5 obs. of  2 variables:
#   ..$ var2a: chr  "v" "w" "x" "y" ...
# ..$ var2b: chr  "vv" "ww" "xx" "yy" ...
# $ var3: chr  "aa" "bb" "cc" "dd" ...

这是怎么回事,为什么jsonlite创建这种奇怪的结构而不是简单的data.frame?我可以避免这种行为吗?如果不能,那么如何才能最优雅地纠正这种情况?我使用了下面的方法,但充其量似乎很hacky.

What is going on here, why is jsonlite creating this odd structure instead of just a simple data.frame? Can I avoid this behaviour, and if not how can I most elegantly rectify this? I've used the approach below, but it feels very hacky, at best.

# Any columns with embedded data frame?
newX <- X[,-which(lapply(X, class) == 'data.frame')] %>%
  # Append them to the end
  cbind(X[,which(lapply(X, class) == 'data.frame')])


更新

建议的解决方法解决了我的问题,但是我仍然觉得我不理解奇怪的嵌入式data.frame结构.我本来会认为,按照R数据格式约定,这种结构是非法的,或者至少在使用[进行子集化方面表现不同.我已经打开了

The suggested workaround solves my issue, but I still feel like I don't understand the strange embedded data.frame structure. I would have thought that such a structure would be illegal by R data format conventions, or at least behave differently in terms of subsetting using [. I have opened a separate question on that.

推荐答案

我认为您想展平df对象:

I think you want to flatten your df object:

json <- toJSON(df)
flat_df <- fromJSON(json, flatten = T)

str(flat_df)

'data.frame':   5 obs. of  5 variables:
 $ ID        : int  1 2 3 4 5
 $ var1      : chr  "a" "b" "c" "d" ...
 $ var3      : chr  "aa" "bb" "cc" "dd" ...
 $ var2.var2a: chr  "v" "w" "x" "y" ...
 $ var2.var2b: chr  "vv" "ww" "xx" "yy" ...

离您要寻找的东西更近吗?

Is that closer to what you're looking for?

这篇关于“嵌入" JSON中的data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆