在数据框的每一列中最频繁 [英] Most frequent in each column in dataframe

查看:128
本文介绍了在数据框的每一列中最频繁的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果您想要为数据框中的每个列找到最大频率并返回因子,类别和频率,您该怎么做? 所以我有代码如下:

  dfreqcommon = data.frame()

for(i in 1:ncol (钻石)){

dfc = data.frame(t(表格(钻石[,i])))
dfc $ Var1 =名称(钻石)[i]

dfreqcommon = rbind(dfreqcommon,dfc)

}

名称(dfreqcommon)= c(因素,类别,频率)

dfreqcommon

但是,这似乎会返回所有因素,类别和频率。我只是想要每个因素的最大频率,并获得它的类别。我尝试将dfc改为

  dfc = data.frame(max(t(table(diamonds [,i]))) )

但它没有显示类别。有什么方法可以解决这个问题吗?

另一种方法是用 base R:

  library(ggplot2)#仅用于获取钻石data.frame 

数据。框架(因素= colnames(菱形),
t(sapply(菱形),#对每列应用以下函数
函数(x){
t_x< - sort = TRUE)#获取频率并按降序排序
list(Categories = names(t_x)[1],#频率最高的值的名称
Frequency = t_x [1])#最高频率
})))
#因素类别频率
#carat克拉0.3 2604
#剪切理想21551
#color color G 11292
#clarity清晰度SI1 13065
#深度深度62 2239
#表格56 98 81
#price price 605 132
#xx 4.37 448
#yy 4.34 437
#zz 2.7 767


What do you do if you wanted to find the maximum frequency for each columns in a dataframe and return the factors, categories, and frequency?

So I have the code as follows:

dfreqcommon = data.frame()

for (i in 1:ncol(diamonds)){

dfc = data.frame(t(table(diamonds[,i])))
dfc$Var1 = names(diamonds)[i]

dfreqcommon = rbind(dfreqcommon, dfc)

}

names(dfreqcommon) = c("Factors","Categories","Frequency")

dfreqcommon

But this seemed to return all factors, categories, and frequency. I just wanted the maximum frequency for each factors and get its categories as well. I tried to change dfc to

dfc = data.frame(max(t(table(diamonds[,i]))))

But it doesn't show the categories. Is there any way to fix this?

解决方案

Another way, with base R:

library(ggplot2) # only to get the diamonds data.frame

data.frame(Factors=colnames(diamonds), 
           t(sapply(diamonds, # apply following function to each column
                    function(x) {
                        t_x <- sort(table(x), decreasing=TRUE) # get the frequencies and sort them in decreasing order
                        list(Categories=names(t_x)[1], # name of the value with highest frequency
                             Frequency=t_x[1]) # highest frequency
                    })))
#        Factors Categories Frequency
#carat     carat        0.3      2604
#cut         cut      Ideal     21551
#color     color          G     11292
#clarity clarity        SI1     13065
#depth     depth         62      2239
#table     table         56      9881
#price     price        605       132
#x             x       4.37       448
#y             y       4.34       437
#z             z        2.7       767

这篇关于在数据框的每一列中最频繁的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆