在R中使用Aggregate()和sum()函数获得不同的结果 [英] Getting different results using aggregate() and sum() functions in R
问题描述
我正在尝试获取变量 prop.damage
和 crop.damage $ c总量的摘要数据框在R中使用
aggregate()
函数通过 STATE
变量通过$ c>:
I'm trying to get a summary data frame of the total quantities of the variables prop.damage
and crop.damage
by STATE
variable using the aggregate()
function in R with the following code:
stormdata$prop.damage <- with(stormdata, ifelse(PROPDMGEXP == 'K', (PROPDMG * 10^3), ifelse(PROPDMGEXP == 'M', (PROPDMG * 10^6), ifelse(PROPDMGEXP == 'B', (PROPDMG * 10^9), NA))))
stormdata$crop.damage <- with(stormdata, ifelse(CROPDMGEXP == 'K', (CROPDMG * 10^3), ifelse(CROPDMGEXP == 'M', (CROPDMG * 10^6), ifelse(CROPDMGEXP == 'B', (CROPDMG * 10^9), NA))))
damagecost <- with(stormdata, aggregate(x = prop.damage + crop.damage, by = list(STATE), FUN = sum, na.rm = TRUE))
damagecost <- damagecost[order(damagecost$x, decreasing = TRUE), ]
此处, PROPDMGEXP
和 CROPDMGEXP
变量用作 PROPDMG的乘数
和 CROPDMG
数字变量。我的主要数据集是 stormdata
。
Here the PROPDMGEXP
and CROPDMGEXP
variables are used as a multiplier for the PROPDMG
and CROPDMG
numeric variables. My main data set is stormdata
.
我得到以下信息:
> head(damagecost)
Group.1 x
8 CA 120211639720
13 FL 27302948100
38 MS 14804212820
63 TX 12550131850
20 IL 11655920860
2 AL 9505473250
但是,例如,如果我为加利福尼亚州手动添加( CA')我明白了:
But, for example, If I do the addition "manually" for California ('CA') I get this:
> sum(stormdata$prop.damage[stormdata$STATE == 'CA'], na.rm = TRUE) + sum(stormdata$crop.damage[stormdata$STATE == 'CA'], na.rm = TRUE)
[1] 127115859410
我不明白为什么我会得到不同的结果。 / p>
I don't understand why I'm getting different results.
推荐答案
证明两个变量 prop.damage
和 crop.damage
中包含 NA
个值,而这些 NAs
会影响结果在 aggregate
函数中添加变量时。
Turns out that both variables prop.damage
and crop.damage
had NA
values within them and those NAs
were affecting the result when the variables were added in the aggregate
function.
这篇关于在R中使用Aggregate()和sum()函数获得不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!