根据因子创建列并进行计数 [英] Create columns from factors and count

查看:80
本文介绍了根据因子创建列并进行计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个看似简单的问题使我很忙.

A seemingly easy problem is keeping me very busy.

我有一个数据框:

> df1
  Name Score
1  Ben     1
2  Ben     2
3 John     1
4 John     2
5 John     3

我想创建一个表摘要,如下所示:

I would like to create a summary of the table like this:

> df2
  Name Score_1 Score_2 Score_3
1  Ben       1       1       0
2 John       1       1       1

因此df2必须(i)仅显示唯一的名称",并且(ii)根据得分"中的唯一因素创建列,并且(iii)计算一个人获得该分数的次数.

So df2 must (i) only show unique "Names" and (ii) create columns from the unique factors in "Score" and (iii) count the number of times a person received said score.

我已经尝试过:

df2 <- ddply(df1, c("Name"), summarise
          ,Score_1 = sum(df1$Score == 1)
          ,Score_2 = sum(df1$Score == 2)
          ,Score_3 = sum(df1$Score == 3))

产生:

  Name Score_1 Score_2 Score_3
1  Ben       2       2       1
2 John       2       2       1

因此,我的尝试不正确地计算了所有次发生,而不是计算每个组"

So my attempt incorrectly counts all occurences instead of counting "per group"

根据评论,还尝试了reshape(可能只是做错了):

As per the comments, also tried reshape (possibly just doing it wrong):

> reshape(df1, idvar = "Name", timevar = "Score", direction = "wide")
  Name
1  Ben
3 John

首先,得分"列丢失了,但更糟糕的是,根据我对reshape的研究,我不相信自己将获得每个因素的计数,这就是重点.

For a start, the "Score" column is missing but worse than that, from my research on reshape, I am not convinced that I am going to get a count of each factor, which is the whole point.

推荐答案

您只需要对代码进行一些小的修改.您应该使用.(Name)而不是c("Name"):

You only need to make some slight modification to your code. You should use .(Name) instead of c("Name"):

ddply(df1, .(Name), summarise,
      Score_1 = sum(Score == 1),
      Score_2 = sum(Score == 2),
      Score_3 = sum(Score == 3))

给予:

  Name Score_1 Score_2 Score_3
1  Ben       1       1       0
2 John       1       1       1


其他可能性包括:


Other possibilities include:

1. table(df1) 2. reshape2 软件包的dcast函数(或具有相同dcast函数的 data.table ):

2. The dcast function of the reshape2 package (or data.table which has the same dcast function):

library(reshape2) # or library(data.table)
dcast(df1, Name ~ paste0("Score_", Score), fun.aggregate = length) 

给予:

  Name Score_1 Score_2 Score_3
1  Ben       1       1       0
2 John       1       1       1

这篇关于根据因子创建列并进行计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆