子集()由其观察次数决定的一个因素 [英] subset() a factor by its number of observation

查看:83
本文介绍了子集()由其观察次数决定的一个因素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对subset()函数有问题.如何通过观察次数将数据框的一个因子子集化?

I have a problem with subset()function. How can I subset a factor of my dataframe by its number of observation?

   NAME      CLASS         COLOR   VALUE      
   antonio       B          YELLOW       5
   antonio       B          BLUE       8
   antonio       B          BLUE       7 
   antonio       B          BLUE      12 
   luca          C          YELLOW    99
   luca          B          YELLOW    87
   luca          B          YELLOW    98
   giovanni      A          BLUE      48

我想获取数据,其中"NAME","CLASS"和"COLOR"这三个因素至少要进行三次比较才能得出VALUE的均值.在这种情况下,我将获得:

I would like to obtain data where the three factors "NAME","CLASS" and "COLOR" compare at least three times in order to make a mean of VALUE. in this case I'll obtain:

   NAME      CLASS         COLOR   VALUE      
   antonio       B          BLUE       mean

因为安东尼奥是唯一一个对每个因素有三个观测值的人

because antonio is the only with three observations for each factor

非常感谢

尼克

推荐答案

您可以按以下方式使用table函数:

You can use the table function as follows:

subset(df, table(FACTOR)[FACTOR] >= 3)
#    FACTOR VALUE
# 1 ANTONIO     5
# 2 ANTONIO     8
# 3 ANTONIO     7

为帮助您理解,请参阅以下内容:

To help you understand, see what these return:

table(df$FACTOR)
table(df$FACTOR)[df$FACTOR]
table(df$FACTOR)[df$FACTOR] >= 3


您还可以使用ave函数来计算观察数:


You could also use the ave function to compute the number of observations:

subset(df, ave(VALUE, FACTOR, FUN = length) >= 3)

如果您有多个因素(例如您在评论和更新的问题中提出的问题),则最后一种方法可能会更加灵活.您可以这样做:

This last method may be a little more flexible if you have multiple factors like you asked in your comment and updated question. You can do:

subset(df, ave(VALUE, NAME, CLASS, COLOR, FUN = length) >= 3)

这篇关于子集()由其观察次数决定的一个因素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆