[] 和 $ 运算符之间用于子集化的区别 [英] Difference between [] and $ operators for subsetting

查看:33
本文介绍了[] 和 $ 运算符之间用于子集化的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用变量名称对数据框进行子集化.我有它的工作,但有一部分我不太明白.

I am trying to subset a data frame by using a variable name. I have it working but there is a part which I don't quite understand.

最初我有这个:rownames (mtcars[mtcars$hp >150,]).

然后,我想将hp"分配给一个变量,而不是硬编码hp":foo <-hp" 和它的子集.我使用这个:rownames (mtcars[mtcars[foo] >150,]).(感谢 链接这阻止了我使用 $ 运算符.)

Then, rather than hard-coding "hp", I wanted to assign "hp" to a variable: foo <- "hp" and subset with that. I got it working using this: rownames (mtcars[mtcars[foo] >150,]). (Thanks to link which stopped me from playing with the $ operator.)

但是,在我构建此声明时,我注意到两者之间存在差异.对于 mtcars$hp >150,我得到这个输出:

But, as I was building up this statement, I noticed there was a difference between the two. For mtcars$hp > 150, I get this output:

 [1] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[25]  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE

对于 mtcars[foo] >150,我明白了:

                       hp
Mazda RX4           FALSE
Mazda RX4 Wag       FALSE
Datsun 710          FALSE
Hornet 4 Drive      FALSE
Hornet Sportabout    TRUE
...

这两个是同一个类型"吗?R 显示第一个没有行名而第二个有行名的原因是什么?

Are these two of the same "type"? Is there any reason why R displays the first one without rownames and the second one with rownames?

也许我天真地认为 $[] 或多或少是等价的.我可以得到相同的最终结果,但我很好奇并担心我的假设是否有误.还好",我忽略了这个差异,继续进行,得到了相同的最终结果.

Perhaps I've naively thought that $ and [] were more or less equivalent. I can get the same final result, but I am curious and worried if my assumptions had been wrong. "Fortunately", I ignored this difference and carried on and got the same final result.

谢谢!

推荐答案

下面我们将使用一行数据框以提供更简洁的输出:

Below we will use the one-row data frame in order to provide briefer output:

mtcars1 <- mtcars[1, ]

注意这些之间的差异.我们可以像class(mtcars["hp"])一样使用class来调查返回值的类.

Note the differences among these. We can use class as in class(mtcars["hp"]) to investigate the class of the return value.

前两个对应问题中的代码,分别返回一个数据框和一个普通向量.[$ 之间的主要区别在于 [ (1) 可以指定多列,(2) 允许传递变量作为索引和 (3) 返回一个数据框(尽管稍后参见示例)而 $ (1) 只能指定一个列,(2) 索引必须是硬编码的,并且 (3) 它返回一个矢量.

The first two correspond to the code in the question and return a data frame and plain vector respectively. The key differences between [ and $ are that [ (1) can specify multiple columns, (2) allows passing of a variable as the index and (3) returns a data frame (although see examples later on) whereas $ (1) can only specify a single column, (2) the index must be hard coded and (3) it returns a vector.

mtcars1["hp"]  # returns data frame
##            hp
## Mazda RX4 110

mtcars1$hp # returns plain vector
## [1] 110

索引是单个元素的其他示例.请注意,下面的第一个和第二个示例实际上与 drop = TRUE 是默认值相同.

Other examples where index is a single element. Note that the first and second examples below are actually the same as drop = TRUE is the default.

mtcars1[, "hp"] # returns plain vector
## [1] 110  

mtcars1[, "hp", drop = TRUE] # returns plain vector
## [1] 110

mtcars1[, "hp", drop = FALSE] # returns data frame
##            hp
## Mazda RX4 110

还有 [[ 运算符,它类似于 $ 运算符,除了它可以接受变量作为索引,而 $ 需要要硬编码的索引:

Also there is the [[ operator which is like the $ operator except it can accept a variable as the index whereas $ requires the index to be hard coded:

mtcars1[["hp"]] # returns plain vector
## [1] 110

其他索引指定多个元素的情况.$[[ 不能与多个元素一起使用,因此这些示例仅使用 [:

Others where index specifies multiple elements. $ and [[ cannot be used with multiple elements so these examples only use [:

mtcars1[c("mpg", "hp")] # returns data frame
##           mpg  hp
## Mazda RX4  21 110

mtcars1[, c("mpg", "hp")] # returns data frame
##           mpg  hp
## Mazda RX4  21 110

mtcars1[, c("mpg", "hp"), drop = FALSE] # returns data frame
##           mpg  hp
## Mazda RX4  21 110

mtcars1[, c("mpg", "hp"), drop = TRUE] # returns list
## $mpg
## [1] 21
## 
## $hp
## [1] 110

[

mtcars[foo] 如果 foo 是具有多个元素的向量,则可以返回多于一列,例如mtcars[c("hp", "mpg")],并且在所有情况下,返回值都是一个 data.frame,即使 foo 只有一个元素(因为它在问题中确实如此).

mtcars[foo] can return more than one column if foo is a vector with more than one element, e.g. mtcars[c("hp", "mpg")], and in all cases the return value is a data.frame even if foo has only one element (as it does in the question).

还有 mtcars[, foo, drop = FALSE] 返回与 mtcars[foo] 相同的值,所以它总是返回一个数据帧.drop = TRUEfoo 指定多列的情况下,它将返回一个列表而不是一个 data.frame,如果它指定一个单列,则返回列本身.

There is also mtcars[, foo, drop = FALSE] which returns the same value as mtcars[foo] so it always returns a data frame. With drop = TRUE it will return a list rather than a data.frame in the case that foo specifies multiple columns and returns the column itself if it specifies a single column.

[[

另一方面 mtcars[[foo]] 仅在 foo 有一个元素并且返回该列而不是数据框时才有效.

On the other hand mtcars[[foo]] only works if foo has one element and it returns that column, not a data frame.

$

mtcars$hp 也仅适用于单个列,例如 [[,并返回该列,而不是包含该列的数据框.

mtcars$hp also only works for a single column, like [[, and returns the column, not a data frame containing that column.

mtcars$hp 就像 mtcars[["hp"]];但是,不可能通过 $ 传递变量索引.只能使用 $ 对索引进行硬编码.

mtcars$hp is like mtcars[["hp"]]; however, there is no possibility to pass a variable index with $. One can only hard-code the index with $.

子集

请注意,这是有效的:

subset(mtcars, hp > 150)

返回包含那些 hp 列超过 150 的行的数据框:

returning a data frame containing those rows where the hp column exceeds 150:

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

其他对象

以上属于数据帧,但其他可以使用 $[[[ 的对象将有自己的规则.特别是如果 m 是一个矩阵,例如m <- as.matrix(BOD),那么m[, 1]是一个向量,不是一列矩阵,而是m[, 1,drop = FALSE] 是一列矩阵.m[[1]]m[1] 都是 m 的第一个元素,而不是第一列.m$a 根本不起作用.

The above pertain to data frames but other objects that can use $, [ and [[ will have their own rules. In particular if m is a matrix, e.g. m <- as.matrix(BOD), then m[, 1] is a vector, not a one column matrix, but m[, 1, drop = FALSE] is a one column matrix. m[[1]] and m[1] are both the first element of m, not the first column. m$a does not work at all.

帮助

有关详细信息,请参阅 ?Extract.此外,?"$"?"["?"[[" 也都到达同一页面.

See ?Extract for more information. Also ?"$", ?"[" and ?"[[" all get to the same page, as well.

这篇关于[] 和 $ 运算符之间用于子集化的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆