我如何用前n个级别(按某种度量)替换因子级别,再加上[其他]? [英] How can I replace a factor levels with the top n levels (by some metric), plus [other]?

查看:148
本文介绍了我如何用前n个级别(按某种度量)替换因子级别,再加上[其他]?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于一个在图表中有超过合理层数的因子,我想用'other'替换不在'前10'中的任何级别。



替代问题:如何将我的因子水平降至数量rcolorbrewer可以绘制为单独的颜色?



例如,如果我想绘制棒球数据中每十年本垒打的数量:

pre > require(ggplot2)
qplot(data = baseball,10 * year%/%10,hr,
stat =identity,geom =bar)


也许我想看看哪些团队对此作出了贡献:

  qplot(data =棒球,10 * year%/%10,hr,
fill = team,
stat =identity,geom =bar)



这会创建太多的颜色级别!颜色是如此相似,你无法区分它们,并且有太多它们不适合在屏幕上。



我真的很想看到顶部X(7)团队(按本垒打总数计算),然后其余所有团队都集中在一个名为其他的类别/颜色中。让我们想象我们有一个功能称为 hotfactor 它知道如何做到这一点:

  hotfactor(afactor, orderby,count)= {??? } 
$ b $ q qplot(数据=棒球,10 *年%/%10,hr,
fill = hotfactor(因子(队),hr,n = 7),
stat =身份,geom =bar)+
scale_fill_brewer(team,Dark2)



那么,我可以用什么'hotfactor'?

解决方案

因此,经过几次迭代并搜索网页后,我创建了这个

  hotfactor = function(fac,by,n = 10,o =other){
等级(fac)[等级(-xtabs(by_factor))[等级(fac)]> n] <-o
fac
}

汇总数据非常棒,您可以使用它来访问伟大的rcolorbrewer配色方案(每个配色方案的数量有限)。




使用说明:

fac应该是一个因子,并且在没有空白因素水平的情况下效果最佳您可能需要先运行 droplevels(as.factor(mydata))

它不是排序因子水平。为了在barcharts中获得最佳效果,您应该对输出因子运行以下内容。

  x < -  hotfactor(f,val)
x< - reorder(x,-val,sum)


For a factor with more than a sensible number of levels to color in a graph, I want to replace any levels that are not in the 'top 10' with 'other'.

Alternate Question: How do I reduce my factor levels to the number rcolorbrewer can plot as separate colors?

For example, if I want to plot number of homeruns per decade from the baseball data:

require(ggplot2)
qplot(data=baseball,10*year%/%10,hr,
  stat="identity",geom="bar")

Perhaps I'd like to see what teams contributed to this:

qplot(data=baseball,10*year%/%10,hr,
  fill=team,
  stat="identity",geom="bar")

This creates too many color levels! The colors are so similar you can't distinguish them, and there are so many they won't fit on the screen.

I'd really like to see the top X (7) teams (by total homerun count) and then the rest all lumped together in a single category/color called 'other'.

Let's imagine we have a function called hotfactor which knows how to do this:

hotfactor(afactor,orderby,count)={ ??? }

qplot(data=baseball,10*year%/%10,hr,
  fill=hotfactor(factor(team),hr,n=7),
  stat="identity",geom="bar") + 
  scale_fill_brewer("team","Dark2")

So what can I use for 'hotfactor'?

解决方案

So after going through several iterations and searching the web, I have created this nice short one.

hotfactor= function(fac,by,n=10,o="other") {
   levels(fac)[rank(-xtabs(by~fac))[levels(fac)]>n] <- o
   fac
}

It's great for summarising data, and you can use it to access the great rcolorbrewer color schemes (which each have a limited number of carefully selected colors).


Usage notes:

fac should be a factor, and works best with no empty factor levels. You may want to run droplevels(as.factor(mydata)) first.

It doesn't sort the factor levels. for best results in barcharts you should run the following on the output factor.

x <- hotfactor(f,val)
x <- reorder(x,-val,sum)

这篇关于我如何用前n个级别(按某种度量)替换因子级别,再加上[其他]?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆