在数据框中跨不同对象查找范围内的所有最大值 [英] Find all largest values in a range, across different objects in data frame

查看:79
本文介绍了在数据框中跨不同对象查找范围内的所有最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于以下情况,我想知道是否有比编写if ... else ...更简单的方法.我有一个数据框,我只希望列百分比"> = 95中具有数字的行.而且,对于一个对象,如果有多行符合此条件,我只想要最大的行.如果有一个以上的最大对象,我希望保留所有这些对象.

I wonder if there is an simpler way than writing if...else... for the following case. I have a dataframe and I only want the rows with number in column "percentage" >=95. Moreover, for one object, if there is multiple rows fitting this criteria, I only want the largest one(s). If there are more than one largest ones, I would like to keep all of them.

例如:

object  city    street  percentage
A   NY  Sun 100
A   NY  Malino  97
A   NY  Waterfall   100
B   CA  Washington  98
B   WA  Lieber  95
C   NA  Moon    75

然后我希望结果显示:

object  city    street  percentage
A   NY  Sun 100
A   NY  Waterfall   100
B   CA  Washington  98

我可以使用if else语句来执行此操作,但是我觉得应该有一些更聪明的说法:1.> = 95 2.如果不止一个,请选择最大的3.如果不止一个,请选择他们全部.

I am able to do it using if else statement, but I feel there should be some smarter ways to say: 1. >=95 2. if more than one, choose the largest 3. if more than one largest, choose them all.

推荐答案

您可以通过创建一个变量来做到这一点,该变量指示每个对象中具有最大百分比的行.然后,我们可以使用该指标对数据进行子集化.

You can do this by creating an variable that indicates the rows that have the maximum percentage for each of the objects. We can then use this indicator to subset the data.

# your data
dat <- read.table(text = "object  city    street  percentage
A   NY  Sun 100
A   NY  Malino  97
A   NY  Waterfall   100
B   CA  Washington  98
B   WA  Lieber  95
C   NA  Moon    75", header=TRUE, na.strings="", stringsAsFactors=FALSE)

# create an indicator to identify the rows that have the maximum
# percentage by object
id <- with(dat, ave(percentage, object, FUN=function(i) i==max(i)) )

# subset your data - keep rows that are greater than 95 and have the 
# maximum group percentage (given by id equal to one)
dat[dat$percentage >= 95 & id , ]

这通过添加语句创建逻辑来起作用,然后可以使用该逻辑对dat的行进行子集化.

This works by the addition statement creating a logical, which can then be used to subset the rows of dat.

dat$percentage >= 95 & id
#[1] TRUE FALSE  TRUE  TRUE FALSE FALSE

或将它们放在一起

with(dat, dat[percentage >= 95 & ave(percentage, object, 
                                           FUN=function(i) i==max(i)) , ])

#   object city     street percentage
# 1      A   NY        Sun        100
# 3      A   NY  Waterfall        100
# 4      B   CA Washington         98

这篇关于在数据框中跨不同对象查找范围内的所有最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆