R:“Binning”分类变量 [英] R: "Binning" categorical variables

查看:209
本文介绍了R:“Binning”分类变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在R工作,我有一个 data.frame ,其中包含13个带有factos的列。其中一列包含信用评级数据,有54个不同的值:

working in R, I have a data.frame which has 13 columns with factos. One of the columns contains credit rating data and has 54 different values:

levels(TR_factor$crclscod)
[1] "A"  "A2" "AA" "B"  "B2" "BA" "C"  "C2" "C5" "CA" "CC" "CY" "D" 
[14] "D2" "D4" "D5" "DA" "E"  "E2" "E4" "EA" "EC" "EF" "EM" "G"  "GA"
[27] "GY" "H"  "I"  "IF" "J"  "JF" "K"  "L"  "M"  "O"  "P1" "TP" "U" 
[40] "U1" "V"  "V1" "W"  "Y"  "Z"  "Z1" "Z2" "Z4" "Z5" "ZA" "ZY" 

我想要的是将这些类别bin为类似

What I want is to "bin" those categories into something like

levels(TR_factor$crclscod)
[1] "all A"  "all B"   "all C"  "all D" [...] "all z"

我的尝试是使用某种形式的这样的构造

My attempt was to use some form of a construct like this

crcls_reduced <- ifelse(TR_factor$crclscod %in% c("A","A2", "AA", "B", "B2","BA", "C" , "C2" ,"C5" ,"CA" ,"CC", "CY", "D",  "D2", "D4", "D5" ,"DA", "E" , "E2", "E4" ,"EA", "EC" ,"EF", "EM", "G" , "GA",  "GY" ,"H", "I",  "IF" ,"J" , "JF" ,"K", "L", "M", "O", "P1","TP", "U", "U1" ,"V",  "V1", "W" , "Y" , "Z" , "Z1", "Z2", "Z4" ,"Z5", "ZA", "ZY"), "A", "B", "C", "D", "E", "G", "H", "I", "J", "K", "L", "M", "O", "P", "T", "U", "V", "W", "Y", "Z")

但当然,这种结构只能产生二进制输出。当然,我可以为每个字母手动完成整个事情,但我希望stackoverflow知道更快更有效的方式 - 例如使用我不知道的一些软件包。

but of course, this construct only is able to produce a binary output. Of course I can do the whole thing manually for each letter, but I hoped that stackoverflow knows a faster and more efficient way -- for instance using some package that I am unaware of.

感谢您的任何建议!

推荐答案

您可以尝试

 factor(paste('all', sub('(.).*$', '\\1', v1)))

 factor(paste('all', substr(v1, 1,1)))



data



data

v1 <- c("A", "A2", "AA", "B", "B2", "BA", "C", "C2", "C5", "CA", "CC", 
"CY", "D", "D2", "D4", "D5", "DA", "E", "E2", "E4", "EA", "EC", 
"EF", "EM", "G", "GA", "GY", "H", "I", "IF", "J", "JF", "K", 
"L", "M", "O", "P1", "TP", "U", "U1", "V", "V1", "W", "Y", "Z", 
"Z1", "Z2", "Z4", "Z5", "ZA", "ZY")

这篇关于R:“Binning”分类变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆