用Year的重复值用Rating的最小值替换R中的DataFrame [英] Subsetting DataFrame in R by duplicate values for Year by lowest value for Rating
问题描述
我有一个看起来像这样的数据框
I have a data frame which looks like this
> fitchRatings
Country Month Year FitchLongTerm LongTermTransformed
1 Abu Dhabi 7 2007 AA 22
2 Angola 5 2012 BB- 12
3 Angola 5 2011 BB- 12
4 Angola 5 2010 B+ 11
5 Argentina 7 2010 B 10
6 Argentina 12 2008 RD 3
7 Argentina 8 2006 RD 3
8 Argentina 12 2005 RD 3
9 Argentina 6 2005 DDD 2
10 Argentina 1 2005 D 0
如您所见,对于某些国家/地区,一年中有多个观察结果.我想对DF进行子集化,以便每个国家/地区年仅保留一个观测值,而我要保留的观测值是"LongTermTransformed"值最小的观测值.
As you can see, for some Countries, there are multiple observations for a single year. I want to subset the DF so that I keep only one observation for each country-year and the observation I want to keep is the one that has the smallest value for "LongTermTransformed".
在此数据集中,Country和LongTermTransformed是因子,Year是整数.
In this data set Country and LongTermTransformed are factors and Year is an integer.
谢谢.
推荐答案
有很多方法可以根据具有分组列的列中的最小值对行进行子集化.一种选择是在将"LongTermTransformed"转换为数字"之后,使用which.min
获取"min"值的索引.我们可以使用slice
来对索引所标识的行进行子集化.
There are many ways to subset the rows based on the minimum value in a column with grouping columns. One option is to get the index of the 'min' value with which.min
after converting the 'LongTermTransformed' to 'numeric'. We can use slice
to subset the rows identified by the index.
library(dplyr)
fitchRatings %>%
group_by(Country, Year) %>%
slice(which.min(as.numeric(as.character(LongTermTransformed))))
或者我们可以在data.table
中使用类似的选项.区别在于我们将'data.frame'转换为'data.table'(setDT
),并使用子集.SD
.
Or we can use a similar option with data.table
. The difference is we convert the 'data.frame' to 'data.table' (setDT
) and for subsetting .SD
was used.
library(data.table)#v1.9.5+
setDT(fitchRatings)[,
.SD[which.min(as.numeric(levels(LongTermTransformed))[LongTermTransformed])],
by = .(Country, Year)]
这篇关于用Year的重复值用Rating的最小值替换R中的DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!