“嵌入式" R中的data.frame是什么,什么叫它,为什么它表现得如此? [英] "Embedded" data.frame in R. What is it, what is it called, why does it behave the way it does?

查看:215
本文介绍了“嵌入式" R中的data.frame是什么,什么叫它,为什么它表现得如此?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中具有以下数据结构:

I have the following data structure in R:

df <- structure(
  list(
    ID = c(1L, 2L, 3L, 4L, 5L),
    var1 = c('a', 'b', 'c', 'd', 'e'),
    var2 = structure(
      list(
        var2a = c('v', 'w', 'x', 'y', 'z'),
        var2b = c('vv', 'ww', 'xx', 'yy', 'zz')),
      .Names = c('var2a', 'var2b'),
      row.names = c(NA, 5L),
      class = 'data.frame'),
    var3 = c('aa', 'bb', 'cc', 'dd', 'ee')),
  .Names = c('ID', 'var1', 'var2', 'var3'),
  row.names = c(NA, 5L),
  class = 'data.frame')

# Looks like this:
#   ID var1 var2.var2a var2.var2b var3
# 1  1    a          v         vv   aa
# 2  2    b          w         ww   bb
# 3  3    c          x         xx   cc
# 4  4    d          y         yy   dd
# 5  5    e          z         zz   ee

这看起来像一个普通的数据帧,并且在大多数情况下都表现得像这样;但请参见下面各列的lengthclass属性:

This looks like a normal data frame, and it behaves like that for the most part; but see length and class properties of the columns below:

class(df)
# [1] "data.frame"

df[1,]
# ID var1 var2.var2a var2.var2b var3
# 1     a          v         vv   aa

dim(df)
# [1] 5 4
# One less than expected due to embedded data frame

lapply(df, class)
# $ID
# [1] "integer"
# 
# $var1
# [1] "character"
# 
# $var2
# [1] "data.frame"
# 
# $var3
# [1] "character"

lapply(df, length)
# $ID
# [1] 5
#
# $var1
# [1] 5
#
# $var2
# [1] 2
#
# $var3
# [1] 5
# str(df)

# 'data.frame': 5 obs. of  4 variables:
#   $ ID  : int  1 2 3 4 5
# $ var1: chr  "a" "b" "c" "d" ...
# $ var2:'data.frame':  5 obs. of  2 variables:
#   ..$ var2a: chr  "v" "w" "x" "y" ...
# ..$ var2b: chr  "vv" "ww" "xx" "yy" ...
# $ var3: chr  "aa" "bb" "cc" "dd" ...

我的问题:

我以前从未遇到过.这对你们中的某些人来说是一种通用格式吗?什么是潜在的用例?

I've never come across this before. Is it a common format for some of you out there? What are potential use cases?

我称其为嵌入式"是因为缺少更好的词.有人建议嵌套",但我认为这是不对的,请参见下面带有tidyverse tibble s的单独部分.

I called this "embedded" for lack of a better word. Somebody suggested "nested", but I don't think that's right, see separate section with tidyverse tibbles below.

我希望上面的structure命令会失败,因为我虽然data.frame本质上是列表,但每个元素(列)具有相同数量的元素(行).在此示例中,似乎违反了该规则,因为var2具有length = 2(列数!).但是,子集df出人意料地以通常的方式成功了:

I would have expected the structure command above to fail, because I though that data.frames are essentially lists, where each element (column) has the same number of elements (rows). This rule seems violated in this example, as var2 has length = 2 (number of columns!). Yet, subsetting df surprisingly succeeds in the usual way:

df[3,]
#   ID var1 var2.var2a var2.var2b var3
# 3  3    c          x         xx   cc

这是怎么回事?

我认为我不能称其为嵌套"结构,该术语用于嵌套data.frames,其外观和行为如下:

I don't think I could call it a "nested" structure, that terminology is used for nested data.frames which would look and behave like this:

library(tidyverse)
df <- data_frame(
  x = c(1L, 2L, 3L),
  nested = list(data_frame(x = c('a', 'b', 'c')), 
                data_frame(x = c('a', 'b', 'c')), 
                data_frame(x = c('d', 'e', 'f'))))
unnest(df)
# # A tibble: 9 × 2
#       x     x
#   <int> <chr>
# 1     1     a
# 2     1     b
# 3     1     c
# 4     2     a
# 5     2     b
# 6     2     c
# 7     3     d
# 8     3     e
# 9     3     f

推荐答案

我认为结构很清晰

str(df)
# 'data.frame':   5 obs. of  4 variables:
#  $ ID  : int  1 2 3 4 5
#  $ var1: chr  "a" "b" "c" "d" ...
#  $ var2:'data.frame':   5 obs. of  2 variables:
#   ..$ var2a: chr  "v" "w" "x" "y" ...
#   ..$ var2b: chr  "vv" "ww" "xx" "yy" ...
#  $ var3: chr  "aa" "bb" "cc" "dd" ...

这是一个data.frame,其中的列(var2)包含data.frame.创建起来并不是一件容易的事,所以我不太确定你是如何做到的,但是从技术上讲,它在R中并不是非法的".

It's a data.frame with a column (var2) that contains a data.frame. This isn't super easy to create so i'm not quite sure how you did it but it isn't technically "illegal" in R.

data.frames可以包含矩阵和其他data.frames.因此,它不仅查看元素的length(),还查看元素的dim(),以查看其是否具有正确的行"数.

data.frames can contain matrices and other data.frames. So it doesn't just look at the length() of the elements, it looks at the dim() of the elements to see if it has the right number of "rows".

我经常使用以下方法修复"或扩展这些data.frames:

I often "fix" or expand these data.frames using

fixed <- do.call("data.frame", df)

这篇关于“嵌入式" R中的data.frame是什么,什么叫它,为什么它表现得如此?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆