“嵌入式" R中的data.frame是什么,什么叫它,为什么它表现得如此? [英] "Embedded" data.frame in R. What is it, what is it called, why does it behave the way it does?
问题描述
我在R中具有以下数据结构:
I have the following data structure in R:
df <- structure(
list(
ID = c(1L, 2L, 3L, 4L, 5L),
var1 = c('a', 'b', 'c', 'd', 'e'),
var2 = structure(
list(
var2a = c('v', 'w', 'x', 'y', 'z'),
var2b = c('vv', 'ww', 'xx', 'yy', 'zz')),
.Names = c('var2a', 'var2b'),
row.names = c(NA, 5L),
class = 'data.frame'),
var3 = c('aa', 'bb', 'cc', 'dd', 'ee')),
.Names = c('ID', 'var1', 'var2', 'var3'),
row.names = c(NA, 5L),
class = 'data.frame')
# Looks like this:
# ID var1 var2.var2a var2.var2b var3
# 1 1 a v vv aa
# 2 2 b w ww bb
# 3 3 c x xx cc
# 4 4 d y yy dd
# 5 5 e z zz ee
这看起来像一个普通的数据帧,并且在大多数情况下都表现得像这样;但请参见下面各列的length
和class
属性:
This looks like a normal data frame, and it behaves like that for the most part; but see length
and class
properties of the columns below:
class(df)
# [1] "data.frame"
df[1,]
# ID var1 var2.var2a var2.var2b var3
# 1 a v vv aa
dim(df)
# [1] 5 4
# One less than expected due to embedded data frame
lapply(df, class)
# $ID
# [1] "integer"
#
# $var1
# [1] "character"
#
# $var2
# [1] "data.frame"
#
# $var3
# [1] "character"
lapply(df, length)
# $ID
# [1] 5
#
# $var1
# [1] 5
#
# $var2
# [1] 2
#
# $var3
# [1] 5
# str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ ID : int 1 2 3 4 5
# $ var1: chr "a" "b" "c" "d" ...
# $ var2:'data.frame': 5 obs. of 2 variables:
# ..$ var2a: chr "v" "w" "x" "y" ...
# ..$ var2b: chr "vv" "ww" "xx" "yy" ...
# $ var3: chr "aa" "bb" "cc" "dd" ...
我的问题:
我以前从未遇到过.这对你们中的某些人来说是一种通用格式吗?什么是潜在的用例?
I've never come across this before. Is it a common format for some of you out there? What are potential use cases?
我称其为嵌入式"是因为缺少更好的词.有人建议嵌套",但我认为这是不对的,请参见下面带有tidyverse
tibble
s的单独部分.
I called this "embedded" for lack of a better word. Somebody suggested "nested", but I don't think that's right, see separate section with tidyverse
tibble
s below.
我希望上面的structure
命令会失败,因为我虽然data.frame本质上是列表,但每个元素(列)具有相同数量的元素(行).在此示例中,似乎违反了该规则,因为var2
具有length = 2
(列数!).但是,子集df
出人意料地以通常的方式成功了:
I would have expected the structure
command above to fail, because I though that data.frames are essentially lists, where each element (column) has the same number of elements (rows). This rule seems violated in this example, as var2
has length = 2
(number of columns!). Yet, subsetting df
surprisingly succeeds in the usual way:
df[3,]
# ID var1 var2.var2a var2.var2b var3
# 3 3 c x xx cc
这是怎么回事?
我认为我不能称其为嵌套"结构,该术语用于嵌套data.frames
,其外观和行为如下:
I don't think I could call it a "nested" structure, that terminology is used for nested data.frames
which would look and behave like this:
library(tidyverse)
df <- data_frame(
x = c(1L, 2L, 3L),
nested = list(data_frame(x = c('a', 'b', 'c')),
data_frame(x = c('a', 'b', 'c')),
data_frame(x = c('d', 'e', 'f'))))
unnest(df)
# # A tibble: 9 × 2
# x x
# <int> <chr>
# 1 1 a
# 2 1 b
# 3 1 c
# 4 2 a
# 5 2 b
# 6 2 c
# 7 3 d
# 8 3 e
# 9 3 f
推荐答案
我认为结构很清晰
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ ID : int 1 2 3 4 5
# $ var1: chr "a" "b" "c" "d" ...
# $ var2:'data.frame': 5 obs. of 2 variables:
# ..$ var2a: chr "v" "w" "x" "y" ...
# ..$ var2b: chr "vv" "ww" "xx" "yy" ...
# $ var3: chr "aa" "bb" "cc" "dd" ...
这是一个data.frame,其中的列(var2
)包含data.frame.创建起来并不是一件容易的事,所以我不太确定你是如何做到的,但是从技术上讲,它在R中并不是非法的".
It's a data.frame with a column (var2
) that contains a data.frame. This isn't super easy to create so i'm not quite sure how you did it but it isn't technically "illegal" in R.
data.frames可以包含矩阵和其他data.frames.因此,它不仅查看元素的length()
,还查看元素的dim()
,以查看其是否具有正确的行"数.
data.frames can contain matrices and other data.frames. So it doesn't just look at the length()
of the elements, it looks at the dim()
of the elements to see if it has the right number of "rows".
我经常使用以下方法修复"或扩展这些data.frames:
I often "fix" or expand these data.frames using
fixed <- do.call("data.frame", df)
这篇关于“嵌入式" R中的data.frame是什么,什么叫它,为什么它表现得如此?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!