如何对基于字符串变量的数字变量的值求和 [英] How to sum the values of a numeric variable based on a string variable
问题描述
请考虑以下数据框:
df <- data.frame(numeric=c(1,2,3,4,5,6,7,8,9,10), string=c("a", "a", "b", "b", "c", "d", "d", "e", "d", "f"))
print(df)
numeric string
1 1 a
2 2 a
3 3 b
4 4 b
5 5 c
6 6 d
7 7 d
8 8 e
9 9 d
10 10 f
它具有一个数字变量和一个字符串变量.现在,我想创建另一个数据框,其中的字符串变量仅显示唯一值"a","b","c","d","e","f"的列表,而数字变量为上一个数据帧中的数值之和的结果,导致该数据帧:
It has a numeric variable and a string variable. Now, I would like to create another dataframe in which the string variable displays only the list of unique values "a", "b", "c", "d", "e", "f", and the numeric variable is the result of the sum of the numeric valuesin the previous dataframe, resulting in this data frame:
print(new_df)
numeric string
1 3 a
2 7 b
3 5 c
4 22 d
5 8 e
6 10 f
这可以使用for循环来完成,但是在大型数据集中效率会很低,我更喜欢其他选项.我尝试使用 dplyr
包,但没有得到预期的结果:
This can be done using a for loop, but it would be rather inefficient in large datasets, and I would prefer other options. I have tried using dplyr
package, but I did not get the expected result:
library(dplyr)
> df %>% group_by(string) %>% summarize(result = sum(numeric))
result
1 55
推荐答案
这可能是来自 plyr
的屏蔽功能的问题( summarise/mutate
函数也位于 plyr
).我们可以从 dplyr
It could be an issue of masking function from plyr
(summarise/mutate
functions are also there in plyr
). We can explicitly specify the summarise
from dplyr
library(dplyr)
df %>%
group_by(string) %>%
dplyr::summarise(numeric = sum(numeric))
这篇关于如何对基于字符串变量的数字变量的值求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!