为什么矩阵和data.frame之间的“越界”索引不同? [英] Why does 'out of bounds' indexing differ between a matrix and a data.frame?
问题描述
我确定这是一种基本的方法,但是我只是想真的在这里了解R数据结构的逻辑。
I'm sure this is kind of basic, but I'd just like to really understand the logic of R data structures here.
如果我按索引对矩阵进行了子集化,则会得到该错误:
If I subset a matrix by index out of bounds, I get exactly that error:
m <- matrix(data = c("foo", "bar"), nrow = 1)
m[2,]
# Error in m[2, ] : subscript out of bounds
但是,如果我也这样做,则会得到所有 NA
行:
If I do the same do a data frame, however, I get all NA
rows:
df <- data.frame(foo = "foo", bar = "bar")
df[2,]
# foo bar
# NA <NA> <NA>
如果我将一个子集放入一个不存在的数据框列中,
If I subset into a non-existent data frame column I get the familiar
df[, 3]
# Error in `[.data.frame`(df, , 3) : undefined columns selected
我知道(大致上)数据帧行很奇怪并且需要谨慎对待,
I know (roughly) that data frame rows are weird and to be treated carefully, but I don't quite see the connection to the above behavior.
有人可以解释为什么 R对于不存在的df以这种方式表现行?
Can someone explain why R behaves in this way for non-existent df rows?
更新
可以肯定的是,给出 NA
在越界子集上,对于一维矢量来说是正常 R行为:
To be sure, giving NA
on out-of-bounds subsets, is normal R behavior for 1D vectors:
vec <- c("foo", "bar")
vec[3]
# [1] NA
因此,在某种程度上,这里的奇怪的是 matrix 子集,而不是dataframe子集,具体取决于您从哪里开始。
仍然会有不同的2D子集行为( m [2,]
与 df [2,]
)
So in a way, the weird one out here is matrix subsetting, not dataframe subsetting, depending from where you're starting out.
Still the different 2D subsetting behavior (m[2, ]
vs df[2, ]
) might strike a dense user (as I am right now) as inconsistent.
推荐答案
有人可以解释为什么R以这种方式表现[?]
Can someone explain why R behaves in this way[?]
简短的回答:不,可能不会。
Short answer: No, probably not.
更长的答案:
曾几何时,我在思考类似的东西,并在R-devel上阅读以下主题: [[[
]的定义。基本上可以归结为:
Longer answer:
Once upon a time I was thinking about something similar and read this thread on R-devel: Definition of [[
. Basically it boils down to:
[
和<$ c $的语义c> [[似乎没有在参考手册。 [...]我以为这些是功能,而不是错误,但我找不到它们的文档
The semantics of
[
and[[
don't seem to be fully specified in the Reference manual. [...] I assume that these are features, not bugs, but I can't find documentation for them
Duncan Murdoch, R核心团队的前成员给人很好的回复:
Duncan Murdoch, a former member of the R core team gives a very nice reply:
手册页中有更多有关
提取
,但我认为它是不完整的。当然,最完整的文档是源代码*,但是它可能无法回答有意和无意的问题
There is more documentation in the man page for
Extract
, but I think it is incomplete. The most complete documentation is of course the source code*, but it may not answer the question of what's intentional and what's accidental
如R-devel线程中所述,手册中唯一的描述是 3.4.1向量索引:
As mentioned in the R-devel thread, the only description in the manual is 3.4.1 Indexing by vectors:
如果
i
为正且超过length(x)
,则相应的选择为NA
If
i
is positive and exceedslength(x)
then the corresponding selection isNA
但是,这适用于简单向量的索引。似乎没有描述针对非简单向量的类似的越界索引。邓肯·默多克(Duncan Murdoch)再次:
But, this applies to "indexing of simple vectors". Similar out of bounds indexing for "non-simple" vectors does not seem to be described. Duncan Murdoch again:
那么,简单的向量是什么?
So what is a simple vector? That is not explicitly defined, and it probably should be.
因此,似乎没有人知道您的答案。 >为什么问题。
Thus, it may seem like no one knows the answer to your why question.
另请参见出色的 R。Inferno,帕特里克·伯恩斯(
See also "8.2.13 nonexistent value in subscript" in the excellent R Inferno by Patrick Burns, and the section "Missing/out of bounds indices" in Hadley's book.
* [
子集运算符。搜索 R_MSG_subs_o_b
(对应于错误消息下标超出范围 )没有提供明显的线索说明为什么OOB [
为矩阵建立索引以及何时使用 [[ [
索引简单向量则导致 NA
。
*Source code for the [
subset operator. A search for R_MSG_subs_o_b
(which corresponds to error message "subscript out of bounds") provides no obvious clue why OOB [
indexing of matrices and when using [[
give an error, whereas OOB [
indexing of "simple vectors" results in NA
.
这篇关于为什么矩阵和data.frame之间的“越界”索引不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!