使用ddply选择特定的行等 [英] selecting specific rows etc. using ddply

查看:72
本文介绍了使用ddply选择特定的行等的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个三部分的问题,该问题基于足球运动员在一个赛季中进球的数据框(df是示例行)

I have a three part question based on a dataframe (df is example rows) of goals scored by soccer players in a season

 Player           Season  Goals
 Teddy Sheringham 1992/3   22
 Les Ferdinand    1992/3   20
 Dean Holdsworth  1992/3   19
 Andy Cole        1993/4   34
 Alan Shearer     1993/4   31
 Chris Sutton     1993/4   25

如果我想获得每年的最佳射手,我可以使用

If I want to obtain the top scorer each year I can use

ddply(df, "Season", summarise, maxGoals = max(Goals),
      Player=Player[which.max(Goals)])

问题:

1)在这种情况下不适用,但是只要有联合得分手就足够了

1) It does not apply in this case but does this suffice if there are joint top scorers

2)我也对每个赛季的亚军都很感兴趣.我一直在按降序排列的目标和索引2进行排序,但是没有找到解决方法

2) I am also interested in the runner up for each season being extracted. I have played around with sorting on Goals descending and index 2 but have not found solution

3)另外我如何根据得分的目标数获得每年的计数值,例如,根据以上数据,目标20"应为1992/3提供1,为1993/4提供3

3) Also how would I obtain a count value for each year based on number of Goals scored e.g Goals>20 should give 1 for 1992/3 and 3 for 1993/4 on the above data

推荐答案

如果有多个最佳参与者,则该表达式将仅报告其中一个(特别是该年数据框中的第一个).

If there are multiple best players, that expression will report only one of them (specifically, the first in the dataframe in that year).

对于第二季度:

d = ddply(df, "Season", summarise, SecondPlayer=Player[order(Goals)[length(Goals)-1]])

对于第3季度:

d = ddply(df, "Season", summarise, Count=sum(Goals > 20))

这篇关于使用ddply选择特定的行等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆