data.frame分组依据列 [英] data.frame Group By column

查看:80
本文介绍了data.frame分组依据列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框DF。

说DF是:

  A B
1 1 2
2 1 3
3 2 3
4 3 5
5 3 6 

现在,我想将A列的行合并在一起,并获得B列的总和。

Now I want to combine together the rows by the column A and to have the sum of the column B.

例如:

  A B
1 1 5
2 2 3
3 3 11

我目前正在使用带有sqldf函数的SQL查询来执行此操作。但是由于某种原因,它非常缓慢。还有更方便的方法吗?我也可以使用for循环手动完成此操作,但速度又很慢。我的SQL查询是从DF组按A选择A,Count(B)。

I am doing this currently using an SQL query with the sqldf function. But for some reason it is very slow. Is there any more convenient way to do that? I could do it manually too using a for loop but it is again slow. My SQL query is " Select A,Count(B) from DF group by A".

通常,每当我不使用向量化操作并且使用for循环时,即使对于单个过程,其性能也极慢。

In general whenever I don't use vectorized operations and I use for loops the performance is extremely slow even for single procedures.

推荐答案

这是一个常见问题。在基础中,您要查找的选项是汇总。假设您的 data.frame 被称为 mydf,则可以使用以下内容。

This is a common question. In base, the option you're looking for is aggregate. Assuming your data.frame is called "mydf", you can use the following.

> aggregate(B ~ A, mydf, sum)
  A  B
1 1  5
2 2  3
3 3 11

我也建议您查看 data.table包。

I would also recommend looking into the "data.table" package.

> library(data.table)
> DT <- data.table(mydf)
> DT[, sum(B), by = A]
   A V1
1: 1  5
2: 2  3
3: 3 11

这篇关于data.frame分组依据列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆