基于频率水平的子集 [英] subset based on frequency level

查看：53 发布时间：2020/11/10 23:02:03 r subset frequency

本文介绍了基于频率水平的子集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想生成一个df，该df选择与"ID"关联的行，而"ID"又与称为cutoff的变量关联.对于此示例，我将截止值设置为9，这意味着我想在df1中选择ID值与9条以上的行相关联的行.我的代码的最后一行生成了我不了解的df.正确的df将有24行，在ID列中全部包含3或4.有人可以解释我的最后一行代码实际上在做什么，并建议其他方法吗?

I want to generate a df that selects rows associated with an "ID" that in turn is associated with a variable called cutoff. For this example, I set the cutoff to 9, meaning that I want to select rows in df1 whose ID value is associated with more than 9 rows. The last line of my code generates a df that I don't understand. The correct df would have 24 rows, all with either a 3 or a 4 in the ID column. Can someone explain what my last line of code is actually doing and suggest a different approach?

set.seed(123)
ID<-rep(c(1,2,3,4,5),times=c(5,7,9,11,13))
sub1<-rnorm(45)
sub2<-rnorm(45)
df1<-data.frame(ID,sub1,sub2)
IDfreq<-count(df1,"ID")
cutoff<-9
df2<-subset(df1,subset=(IDfreq$freq>cutoff))

推荐答案

df1[ df1$ID %in%  names(table(df1$ID))[table(df1$ID) >9] , ]

这将测试df1 $ ID值是否在具有9个以上值的类别中.如果是，则返回矢量的逻辑元素将为TRUE，并且由于"j"项为空，因此作为"i"参数将导致[函数返回整行.

This will test to see if the df1$ID value is in a category with more than 9 values. If it is, then the logical element for the returned vector will be TRUE and in turn that as the "i" argument will cause the [-function to return the entire row since the "j" item is empty.

请参阅:

?`[`
?'%in%'

这篇关于基于频率水平的子集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

基于频率水平的子集 [英] subset based on frequency level

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

基于频率水平的子集 [英] subset based on frequency level

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭