data.frame分组依据列 [英] data.frame Group By column
问题描述
我有一个数据框DF。
说DF是:
A B
1 1 2
2 1 3
3 2 3
4 3 5
5 3 6
现在,我想将A列的行合并在一起,并获得B列的总和。
Now I want to combine together the rows by the column A and to have the sum of the column B.
例如:
A B
1 1 5
2 2 3
3 3 11
我目前正在使用带有sqldf函数的SQL查询来执行此操作。但是由于某种原因,它非常缓慢。还有更方便的方法吗?我也可以使用for循环手动完成此操作,但速度又很慢。我的SQL查询是从DF组按A选择A,Count(B)。
I am doing this currently using an SQL query with the sqldf function. But for some reason it is very slow. Is there any more convenient way to do that? I could do it manually too using a for loop but it is again slow. My SQL query is " Select A,Count(B) from DF group by A".
通常,每当我不使用向量化操作并且使用for循环时,即使对于单个过程,其性能也极慢。
In general whenever I don't use vectorized operations and I use for loops the performance is extremely slow even for single procedures.
推荐答案
这是一个常见问题。在基础中,您要查找的选项是汇总
。假设您的 data.frame
被称为 mydf,则可以使用以下内容。
This is a common question. In base, the option you're looking for is aggregate
. Assuming your data.frame
is called "mydf", you can use the following.
> aggregate(B ~ A, mydf, sum)
A B
1 1 5
2 2 3
3 3 11
我也建议您查看 data.table包。
I would also recommend looking into the "data.table" package.
> library(data.table)
> DT <- data.table(mydf)
> DT[, sum(B), by = A]
A V1
1: 1 5
2: 2 3
3: 3 11
这篇关于data.frame分组依据列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!