R 中 2 种子集数据方法的不同结果 [英] Different results for 2 subset data methods in R
问题描述
我正在对我的数据进行子集化,对于以下代码,我得到了不同的结果:
I'm subseting my data, and I'm getting different results for the following codes:
subset(df, x==1)
df[df$x==1,]
x
的类型是 integer
我做错了吗?提前致谢
推荐答案
没有示例数据,很难说出您的问题是什么.但是,我的预感是以下内容可能可以解释您的问题:
Without example data, it is difficult to say what your problem is. However, my hunch is that the following probably explains your problem:
df <- data.frame(quantity=c(1:3, NA), item=c("Coffee", "Americano", "Espresso", "Decaf"))
df
quantity item
1 Coffee
2 Americano
3 Espresso
NA Decaf
让我们用 [
df[df$quantity == 2,]
quantity item
2 Americano
NA <NA>
现在让我们用 subset
子集:
Now let's subset with subset
:
subset(df, quantity == 2)
quantity item
2 Americano
我们看到根据 NA
值的处理方式,子设置输出存在差异.我认为这如下:使用 subset
,您明确说明您想要条件可验证为真的子集.df$quantity==2
生成真/假陈述的向量,但在数量缺失的情况下,不可能分配 TRUE
或 FALSE
.这就是为什么我们得到以下带有 NA 的输出:
We see that there is a difference in sub-setting output depending on how NA
values are treated. I think of this as follows: With subset
, you are explicitly stating you want the subset for which the condition is verifiably true. df$quantity==2
produces a vector of true/false-statements, but where quantity is missing, it is impossible to assign TRUE
or FALSE
. This is why we get the following output with an NA at the end:
df$quantity==2
[1] FALSE TRUE FALSE NA
函数 [
接受这个向量,但不明白如何处理 NA
,这就是为什么我们得到 NA Decaf
而不是 NA Decaf
代码>NA [
,则可以使用以下代码:
The function [
takes this vector but does not understand what to do with NA
, which is why instead of NA Decaf
we get NA <NA>
. If you prefer using [
, you could use the following instead:
df[which(df$quantity == 2),]
quantity item
2 Americano
这将逻辑条件 df$quantity == 2
转换为向量或行号,其中逻辑条件可验证"满足.
This translates the logical condition df$quantity == 2
into a vector or row numbers where the logical condition is "verifiably" satisfied.
这篇关于R 中 2 种子集数据方法的不同结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!