在数据框的每一列中最频繁 [英] Most frequent in each column in dataframe
本文介绍了在数据框的每一列中最频繁的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如果您想要为数据框中的每个列找到最大频率并返回因子,类别和频率,您该怎么做? 所以我有代码如下:
dfreqcommon = data.frame()
for(i in 1:ncol (钻石)){
dfc = data.frame(t(表格(钻石[,i])))
dfc $ Var1 =名称(钻石)[i]
dfreqcommon = rbind(dfreqcommon,dfc)
}
名称(dfreqcommon)= c(因素,类别,频率)
dfreqcommon
但是,这似乎会返回所有因素,类别和频率。我只是想要每个因素的最大频率,并获得它的类别。我尝试将dfc改为
dfc = data.frame(max(t(table(diamonds [,i]))) )
但它没有显示类别。有什么方法可以解决这个问题吗?
另一种方法是用 base
R: library(ggplot2)#仅用于获取钻石data.frame
数据。框架(因素= colnames(菱形),
t(sapply(菱形),#对每列应用以下函数
函数(x){
t_x< - sort = TRUE)#获取频率并按降序排序
list(Categories = names(t_x)[1],#频率最高的值的名称
Frequency = t_x [1])#最高频率
})))
#因素类别频率
#carat克拉0.3 2604
#剪切理想21551
#color color G 11292
#clarity清晰度SI1 13065
#深度深度62 2239
#表格56 98 81
#price price 605 132
#xx 4.37 448
#yy 4.34 437
#zz 2.7 767
What do you do if you wanted to find the maximum frequency for each columns in a dataframe and return the factors, categories, and frequency?
So I have the code as follows:
dfreqcommon = data.frame()
for (i in 1:ncol(diamonds)){
dfc = data.frame(t(table(diamonds[,i])))
dfc$Var1 = names(diamonds)[i]
dfreqcommon = rbind(dfreqcommon, dfc)
}
names(dfreqcommon) = c("Factors","Categories","Frequency")
dfreqcommon
But this seemed to return all factors, categories, and frequency. I just wanted the maximum frequency for each factors and get its categories as well. I tried to change dfc to
dfc = data.frame(max(t(table(diamonds[,i]))))
But it doesn't show the categories. Is there any way to fix this?
解决方案
Another way, with base
R:
library(ggplot2) # only to get the diamonds data.frame
data.frame(Factors=colnames(diamonds),
t(sapply(diamonds, # apply following function to each column
function(x) {
t_x <- sort(table(x), decreasing=TRUE) # get the frequencies and sort them in decreasing order
list(Categories=names(t_x)[1], # name of the value with highest frequency
Frequency=t_x[1]) # highest frequency
})))
# Factors Categories Frequency
#carat carat 0.3 2604
#cut cut Ideal 21551
#color color G 11292
#clarity clarity SI1 13065
#depth depth 62 2239
#table table 56 9881
#price price 605 132
#x x 4.37 448
#y y 4.34 437
#z z 2.7 767
这篇关于在数据框的每一列中最频繁的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文