用Year的重复值用Rating的最小值替换R中的DataFrame [英] Subsetting DataFrame in R by duplicate values for Year by lowest value for Rating

查看:89
本文介绍了用Year的重复值用Rating的最小值替换R中的DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据框

I have a data frame which looks like this

> fitchRatings
               Country Month Year FitchLongTerm LongTermTransformed
1            Abu Dhabi     7 2007            AA                  22
2               Angola     5 2012           BB-                  12
3               Angola     5 2011           BB-                  12
4               Angola     5 2010            B+                  11
5            Argentina     7 2010             B                  10
6            Argentina    12 2008            RD                   3
7            Argentina     8 2006            RD                   3
8            Argentina    12 2005            RD                   3
9            Argentina     6 2005           DDD                   2
10           Argentina     1 2005             D                   0

如您所见,对于某些国家/地区,一年中有多个观察结果.我想对DF进行子集化,以便每个国家/地区年仅保留一个观测值,而我要保留的观测值是"LongTermTransformed"值最小的观测值.

As you can see, for some Countries, there are multiple observations for a single year. I want to subset the DF so that I keep only one observation for each country-year and the observation I want to keep is the one that has the smallest value for "LongTermTransformed".

在此数据集中,Country和LongTermTransformed是因子,Year是整数.

In this data set Country and LongTermTransformed are factors and Year is an integer.

谢谢.

推荐答案

有很多方法可以根据具有分组列的列中的最小值对行进行子集化.一种选择是在将"LongTermTransformed"转换为数字"之后,使用which.min获取"min"值的索引.我们可以使用slice来对索引所标识的行进行子集化.

There are many ways to subset the rows based on the minimum value in a column with grouping columns. One option is to get the index of the 'min' value with which.min after converting the 'LongTermTransformed' to 'numeric'. We can use slice to subset the rows identified by the index.

library(dplyr)
fitchRatings %>%
          group_by(Country, Year) %>%
          slice(which.min(as.numeric(as.character(LongTermTransformed))))

或者我们可以在data.table中使用类似的选项.区别在于我们将'data.frame'转换为'data.table'(setDT),并使用子集.SD.

Or we can use a similar option with data.table. The difference is we convert the 'data.frame' to 'data.table' (setDT) and for subsetting .SD was used.

library(data.table)#v1.9.5+
setDT(fitchRatings)[, 
 .SD[which.min(as.numeric(levels(LongTermTransformed))[LongTermTransformed])],
              by = .(Country, Year)]

这篇关于用Year的重复值用Rating的最小值替换R中的DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆