条件均值陈述 [英] Conditional mean statement

查看:154
本文介绍了条件均值陈述的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为bwght的数据集,其中包含变量cigs(每天抽烟的雪茄)

I have a dataset named bwght which contains the variable cigs (cigarattes smoked per day)

当我使用以下方法在数据集bwght中计算cigs的平均值时: mean(bwght$cigs),我得到一个数字2.08.

When I calculate the mean of cigs in the dataset bwght using: mean(bwght$cigs), I get a number 2.08.

样本中1388名妇女中只有212名吸烟(而1176名妇女不吸烟):

Only 212 of the 1388 women in the sample smoke (and 1176 does not smoke):

summary(bwght$cigs>0)给出结果:

Mode      FALSE    TRUE    NA's 
logical    1176     212       0

我被要求在吸烟的女性(212名)中找到cigs的平均值.

I'm asked to find the average of cigs among the women who smoke (the 212).

我很难找到排除不吸烟者的正确语法= 0 我已经尝试过:

I'm having a hard time finding the right syntax for excluding the non smokers = 0 I have tried:

  • mean(bwght$cigs| bwght$cigs>0)

mean(bwght$cigs>0 | bwght$cigs=TRUE)

if (bwght$cigs > 0){ sum(bwght$cigs) }

if (bwght$cigs > 0){ sum(bwght$cigs) }

x <-as.numeric(bwght$cigs, rm="0"); mean(x)

x <-as.numeric(bwght$cigs, rm="0"); mean(x)

但是似乎没有任何效果!谁能帮我吗?

But nothing seems to work! Can anyone please help me??

推荐答案

如果要排除非吸烟者,则有一些选择.最简单的可能是这样:

If you want to exclude the non-smokers, you have a few options. The easiest is probably this:

mean(bwght[bwght$cigs>0,"cigs"])

对于数据框,第一个变量是行,第二个变量是列.因此,您可以使用dataframe[1,2]进行子集化以获取第一行,第二列.您也可以在行选择中使用逻辑.通过使用bwght$cigs>0作为第一个元素,您将子集设置为仅包含cigs不为零的行.

With a data frame, the first variable is the row and the next is the column. So, you can subset using dataframe[1,2] to get the first row, second column. You can also use logic in the row selection. By using bwght$cigs>0 as the first element, you are subsetting to only have the rows where cigs is not zero.

您的其他人由于以下原因而无法正常工作:

Your other ones didn't work for the following reasons:

mean(bwght$cigs| bwght$cigs>0)

这实际上是逻辑上的比较.您要查询bwght$cigs OR bwght$cigs>0的TRUE/FALSE结果,然后取平均值.我不太确定,但我认为R甚至不能接受类型为mean()函数逻辑的数据.

This is effectively a logical comparison. You're asking for the TRUE / FALSE result of bwght$cigs OR bwght$cigs>0, and then taking the mean on it. I'm not totally sure, but I think R can't even take data typed as logical for the mean() function.

mean(bwght$cigs>0 | bwght$cigs=TRUE)

相同的问题.您使用|符号返回一个逻辑,而R试图取逻辑的平均值.

Same problem. You use the | sign, which returns a logical, and R is trying to take the mean of logicals.

if(bwght$cigs > 0){sum(bwght$cigs)}

有机会,您最初是SAS程序员吗?看起来就像我一开始键入的样子.基本上,if()在R中的工作方式与在SAS中不同.在该示例中,您将bwght$cigs > 0用作if条件,这将不起作用,因为R只会查看bwght $ cigs> 0导致的向量的第一个元素.例如lapply,tapply等.

By any chance, were you a SAS programmer originally? This looks like how I used to type at first. Basically, if() doesn't work the same way in R as it does in SAS. In that example, you are using bwght$cigs > 0 as the if condition, which won't work because R will only look at the first element of the vector resulting from bwght$cigs > 0. R handles looping differently from SAS - check out functions like lapply, tapply, and so on.

x <-as.numeric(bwght$cigs, rm="0")
mean(x)

老实说,我不知道该怎么办.如果rm="0"没有引号,可能会起作用...?

I honestly don't know what this would do. It might work if rm="0" didn't have quotes...?

这篇关于条件均值陈述的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆