条件均值陈述 [英] Conditional mean statement
问题描述
我有一个名为bwght
的数据集,其中包含变量cigs
(每天抽烟的雪茄)
I have a dataset named bwght
which contains the variable cigs
(cigarattes smoked per day)
当我使用以下方法在数据集bwght
中计算cigs
的平均值时:
mean(bwght$cigs)
,我得到一个数字2.08.
When I calculate the mean of cigs
in the dataset bwght
using:
mean(bwght$cigs)
, I get a number 2.08.
样本中1388名妇女中只有212名吸烟(而1176名妇女不吸烟):
Only 212 of the 1388 women in the sample smoke (and 1176 does not smoke):
summary(bwght$cigs>0)
给出结果:
Mode FALSE TRUE NA's
logical 1176 212 0
我被要求在吸烟的女性(212名)中找到cigs
的平均值.
I'm asked to find the average of cigs
among the women who smoke (the 212).
我很难找到排除不吸烟者的正确语法= 0 我已经尝试过:
I'm having a hard time finding the right syntax for excluding the non smokers = 0 I have tried:
-
mean(bwght$cigs| bwght$cigs>0)
mean(bwght$cigs>0 | bwght$cigs=TRUE)
if (bwght$cigs > 0){
sum(bwght$cigs)
}
if (bwght$cigs > 0){
sum(bwght$cigs)
}
x <-as.numeric(bwght$cigs, rm="0");
mean(x)
x <-as.numeric(bwght$cigs, rm="0");
mean(x)
但是似乎没有任何效果!谁能帮我吗?
But nothing seems to work! Can anyone please help me??
推荐答案
如果要排除非吸烟者,则有一些选择.最简单的可能是这样:
If you want to exclude the non-smokers, you have a few options. The easiest is probably this:
mean(bwght[bwght$cigs>0,"cigs"])
对于数据框,第一个变量是行,第二个变量是列.因此,您可以使用dataframe[1,2]
进行子集化以获取第一行,第二列.您也可以在行选择中使用逻辑.通过使用bwght$cigs>0
作为第一个元素,您将子集设置为仅包含cigs
不为零的行.
With a data frame, the first variable is the row and the next is the column. So, you can subset using dataframe[1,2]
to get the first row, second column. You can also use logic in the row selection. By using bwght$cigs>0
as the first element, you are subsetting to only have the rows where cigs
is not zero.
您的其他人由于以下原因而无法正常工作:
Your other ones didn't work for the following reasons:
mean(bwght$cigs| bwght$cigs>0)
这实际上是逻辑上的比较.您要查询bwght$cigs OR bwght$cigs>0
的TRUE/FALSE结果,然后取平均值.我不太确定,但我认为R甚至不能接受类型为mean()
函数逻辑的数据.
This is effectively a logical comparison. You're asking for the TRUE / FALSE result of bwght$cigs OR bwght$cigs>0
, and then taking the mean on it. I'm not totally sure, but I think R can't even take data typed as logical for the mean()
function.
mean(bwght$cigs>0 | bwght$cigs=TRUE)
相同的问题.您使用|
符号返回一个逻辑,而R试图取逻辑的平均值.
Same problem. You use the |
sign, which returns a logical, and R is trying to take the mean of logicals.
if(bwght$cigs > 0){sum(bwght$cigs)}
有机会,您最初是SAS程序员吗?看起来就像我一开始键入的样子.基本上,if()
在R中的工作方式与在SAS中不同.在该示例中,您将bwght$cigs > 0
用作if条件,这将不起作用,因为R只会查看bwght $ cigs> 0导致的向量的第一个元素.例如lapply,tapply等.
By any chance, were you a SAS programmer originally? This looks like how I used to type at first. Basically, if()
doesn't work the same way in R as it does in SAS. In that example, you are using bwght$cigs > 0
as the if condition, which won't work because R will only look at the first element of the vector resulting from bwght$cigs > 0. R handles looping differently from SAS - check out functions like lapply, tapply, and so on.
x <-as.numeric(bwght$cigs, rm="0")
mean(x)
老实说,我不知道该怎么办.如果rm="0"
没有引号,可能会起作用...?
I honestly don't know what this would do. It might work if rm="0"
didn't have quotes...?
这篇关于条件均值陈述的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!