基于频率水平的子集 [英] subset based on frequency level

查看:53
本文介绍了基于频率水平的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想生成一个df,该df选择与"ID"关联的行,而"ID"又与称为cutoff的变量关联.对于此示例,我将截止值设置为9,这意味着我想在df1中选择ID值与9条以上的行相关联的行.我的代码的最后一行生成了我不了解的df.正确的df将有24行,在ID列中全部包含3或4.有人可以解释我的最后一行代码实际上在做什么,并建议其他方法吗?

I want to generate a df that selects rows associated with an "ID" that in turn is associated with a variable called cutoff. For this example, I set the cutoff to 9, meaning that I want to select rows in df1 whose ID value is associated with more than 9 rows. The last line of my code generates a df that I don't understand. The correct df would have 24 rows, all with either a 3 or a 4 in the ID column. Can someone explain what my last line of code is actually doing and suggest a different approach?

set.seed(123)
ID<-rep(c(1,2,3,4,5),times=c(5,7,9,11,13))
sub1<-rnorm(45)
sub2<-rnorm(45)
df1<-data.frame(ID,sub1,sub2)
IDfreq<-count(df1,"ID")
cutoff<-9
df2<-subset(df1,subset=(IDfreq$freq>cutoff))

推荐答案

df1[ df1$ID %in%  names(table(df1$ID))[table(df1$ID) >9] , ]

这将测试df1 $ ID值是否在具有9个以上值的类别中.如果是,则返回矢量的逻辑元素将为TRUE,并且由于"j"项为空,因此作为"i"参数将导致[函数返回整行.

This will test to see if the df1$ID value is in a category with more than 9 values. If it is, then the logical element for the returned vector will be TRUE and in turn that as the "i" argument will cause the [-function to return the entire row since the "j" item is empty.

请参阅:

?`[`
?'%in%'

这篇关于基于频率水平的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆