子集()由其观察次数决定的一个因素 [英] subset() a factor by its number of observation
问题描述
我对subset()函数有问题.如何通过观察次数将数据框的一个因子子集化?
I have a problem with subset()function. How can I subset a factor of my dataframe by its number of observation?
NAME CLASS COLOR VALUE
antonio B YELLOW 5
antonio B BLUE 8
antonio B BLUE 7
antonio B BLUE 12
luca C YELLOW 99
luca B YELLOW 87
luca B YELLOW 98
giovanni A BLUE 48
我想获取数据,其中"NAME","CLASS"和"COLOR"这三个因素至少要进行三次比较才能得出VALUE的均值.在这种情况下,我将获得:
I would like to obtain data where the three factors "NAME","CLASS" and "COLOR" compare at least three times in order to make a mean of VALUE. in this case I'll obtain:
NAME CLASS COLOR VALUE
antonio B BLUE mean
因为安东尼奥是唯一一个对每个因素有三个观测值的人
because antonio is the only with three observations for each factor
非常感谢
尼克
推荐答案
您可以按以下方式使用table
函数:
You can use the table
function as follows:
subset(df, table(FACTOR)[FACTOR] >= 3)
# FACTOR VALUE
# 1 ANTONIO 5
# 2 ANTONIO 8
# 3 ANTONIO 7
为帮助您理解,请参阅以下内容:
To help you understand, see what these return:
table(df$FACTOR)
table(df$FACTOR)[df$FACTOR]
table(df$FACTOR)[df$FACTOR] >= 3
您还可以使用ave
函数来计算观察数:
You could also use the ave
function to compute the number of observations:
subset(df, ave(VALUE, FACTOR, FUN = length) >= 3)
如果您有多个因素(例如您在评论和更新的问题中提出的问题),则最后一种方法可能会更加灵活.您可以这样做:
This last method may be a little more flexible if you have multiple factors like you asked in your comment and updated question. You can do:
subset(df, ave(VALUE, NAME, CLASS, COLOR, FUN = length) >= 3)
这篇关于子集()由其观察次数决定的一个因素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!