在子集函数和逻辑运算符中使用多个条件 [英] Using multiple criteria in subset function and logical operators
问题描述
如果要在R中选择数据的子集,则可以使用子集功能.我想根据符合以下几个条件之一的数据进行分析:某个变量是1、2或3. 我尝试过
If I want to select a subset of data in R, I can use the subset function. I wanted to base an analysis on data that that was matching one of a few criteria, e.g. that a certain variable was either 1, 2 or 3. I tried
myNewDataFrame <- subset(bigfive, subset = (bigfive$bf11==(1||2||3)))
它总是只选择与第一个条件匹配的值,这里是1.我的假设是,它将以1开头,如果它的计算结果为"false",则将继续为2,然后为3,然后如果没有匹配项,则==之后的语句为"false",如果其中之一匹配,则为"true".
It did always just select values that matched the first of the criteria, here 1. My assumption was that it would start with 1 and if it does evaluate to "false" it would go on to 2 and than to 3, and if none matches the statement after == is "false" and if one of them matches, it is "true".
我使用
newDataFrame <- subset(bigfive, subset = (bigfive$bf11==c(1,2,3)))
但是我希望能够通过逻辑运算符选择数据,所以:为什么第一种方法不起作用?
But I would like to be able to select data via logical operators, so: why did the first approach not work?
推荐答案
此处的正确运算符为%in%
.这是虚拟数据的示例:
The correct operator is %in%
here. Here is an example with dummy data:
set.seed(1)
dat <- data.frame(bf11 = sample(4, 10, replace = TRUE),
foo = runif(10))
给予:
> head(dat)
bf11 foo
1 2 0.2059746
2 2 0.1765568
3 3 0.6870228
4 4 0.3841037
5 1 0.7698414
6 4 0.4976992
dat
的子集(其中bf11
等于任何集合1,2,3
)是使用%in%
进行的:
The subset of dat
where bf11
equals any of the set 1,2,3
is taken as follows using %in%
:
> subset(dat, subset = bf11 %in% c(1,2,3))
bf11 foo
1 2 0.2059746
2 2 0.1765568
3 3 0.6870228
5 1 0.7698414
8 3 0.9919061
9 3 0.3800352
10 1 0.7774452
关于您的原件不起作用的原因,请对其进行分解以查看问题.查看1||2||3
的计算结果:
As to why your original didn't work, break it down to see the problem. Look at what 1||2||3
evaluates to:
> 1 || 2 || 3
[1] TRUE
,您将使用|
获得相同的结果.结果,subset()
调用将仅返回bf11
为TRUE
的行(或计算为TRUE
的行).
and you'd get the same using |
instead. As a result, the subset()
call would only return rows where bf11
was TRUE
(or something that evaluated to TRUE
).
您本可以写的东西会是这样的:
What you could have written would have been something like:
subset(dat, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)
给出的结果与我先前的subset()
调用相同.关键是您需要一系列的单个比较,而不是一系列选项的比较.但是正如您所看到的,在这种情况下,%in%
更加有用,而且不那么冗长.还要注意,由于要依次比较bf11
和1
,2
和3
的每个元素,因此必须使用|
.比较:
Which gives the same result as my earlier subset()
call. The point is that you need a series of single comparisons, not a comparison of a series of options. But as you can see, %in%
is far more useful and less verbose in such circumstances. Notice also that I have to use |
as I want to compare each element of bf11
against 1
, 2
, and 3
, in turn. Compare:
> with(dat, bf11 == 1 || bf11 == 2)
[1] TRUE
> with(dat, bf11 == 1 | bf11 == 2)
[1] TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
这篇关于在子集函数和逻辑运算符中使用多个条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!