为什么矩阵和data.frame之间的“越界”索引不同? [英] Why does 'out of bounds' indexing differ between a matrix and a data.frame?

查看:81
本文介绍了为什么矩阵和data.frame之间的“越界”索引不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我确定这是一种基本的方法,但是我只是想真的在这里了解R数据结构的逻辑。

I'm sure this is kind of basic, but I'd just like to really understand the logic of R data structures here.

如果我按索引对矩阵进行了子集化,则会得到该错误:

If I subset a matrix by index out of bounds, I get exactly that error:

m <- matrix(data = c("foo", "bar"), nrow = 1)
m[2,]
# Error in m[2, ] : subscript out of bounds

但是,如果我也这样做,则会得到所有 NA

If I do the same do a data frame, however, I get all NA rows:

df <- data.frame(foo = "foo", bar = "bar")
df[2,]
#    foo  bar
# NA <NA> <NA>

如果我将一个子集放入一个不存在的数据框中,

If I subset into a non-existent data frame column I get the familiar

df[, 3]
# Error in `[.data.frame`(df, , 3) : undefined columns selected

我知道(大致上)数据帧行很奇怪并且需要谨慎对待,

I know (roughly) that data frame rows are weird and to be treated carefully, but I don't quite see the connection to the above behavior.

有人可以解释为什么 R对于不存在的df以这种方式表现行?

Can someone explain why R behaves in this way for non-existent df rows?

更新

可以肯定的是,给出 NA 在越界子集上,对于一维矢量来说是正常 R行为:

To be sure, giving NA on out-of-bounds subsets, is normal R behavior for 1D vectors:

vec <- c("foo", "bar")
vec[3]
# [1] NA

因此,在某种程度上,这里的奇怪的 matrix 子集,而不是dataframe子集,具体取决于您从哪里开始。
仍然会有不同的2D子集行为( m [2,] df [2,]

So in a way, the weird one out here is matrix subsetting, not dataframe subsetting, depending from where you're starting out. Still the different 2D subsetting behavior (m[2, ] vs df[2, ]) might strike a dense user (as I am right now) as inconsistent.

推荐答案


有人可以解释为什么R以这种方式表现[?]

Can someone explain why R behaves in this way[?]

简短的回答:不,可能不会。

Short answer: No, probably not.

更长的答案:
曾几何时,我在思考类似的东西,并在R-devel上阅读以下主题: [[[ ]的定义。基本上可以归结为:

Longer answer: Once upon a time I was thinking about something similar and read this thread on R-devel: Definition of [[. Basically it boils down to:


[和<$ c $的语义c> [[似乎没有在参考手册。 [...]我以为这些是功能,而不是错误,但我找不到它们的文档

The semantics of [ and [[ don't seem to be fully specified in the Reference manual. [...] I assume that these are features, not bugs, but I can't find documentation for them

Duncan Murdoch, R核心团队的前成员给人很好的回复

Duncan Murdoch, a former member of the R core team gives a very nice reply:


手册页中有更多有关提取,但我认为它是不完整的。当然,最完整的文档是源代码*,但是它可能无法回答有意和无意的问题

There is more documentation in the man page for Extract, but I think it is incomplete. The most complete documentation is of course the source code*, but it may not answer the question of what's intentional and what's accidental

如R-devel线程中所述,手册中唯一的描述是 3.4.1向量索引

As mentioned in the R-devel thread, the only description in the manual is 3.4.1 Indexing by vectors:


如果 i 为正且超过 length(x),则相应的选择为 NA

If i is positive and exceeds length(x) then the corresponding selection is NA

但是,这适用于简单向量的索引。似乎没有描述针对非简单向量的类似的越界索引。邓肯·默多克(Duncan Murdoch)再次:

But, this applies to "indexing of simple vectors". Similar out of bounds indexing for "non-simple" vectors does not seem to be described. Duncan Murdoch again:


那么,简单的向量是什么?

So what is a simple vector? That is not explicitly defined, and it probably should be.

因此,似乎没有人知道您的答案。 >为什么问题。

Thus, it may seem like no one knows the answer to your why question.

另请参见出色的 R。Inferno,帕特里克·伯恩斯(

See also "8.2.13 nonexistent value in subscript" in the excellent R Inferno by Patrick Burns, and the section "Missing/out of bounds indices" in Hadley's book.

* [子集运算符。搜索 R_MSG_subs_o_b (对应于错误消息下标超出范围 )没有提供明显的线索说明为什么OOB [为矩阵建立索引以及何时使用 [[ [索引简单向量则导致 NA

*Source code for the [ subset operator. A search for R_MSG_subs_o_b (which corresponds to error message "subscript out of bounds") provides no obvious clue why OOB [ indexing of matrices and when using [[ give an error, whereas OOB [ indexing of "simple vectors" results in NA.

这篇关于为什么矩阵和data.frame之间的“越界”索引不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆