改变因素中具体水平的名称 [英] change name of specific levels in factor

查看:151
本文介绍了改变因素中具体水平的名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理的数据框架包含很多因素。从mtcars (cyl,vs,am,gear,carb)取分类变量。

  head(mtcars [c(cyl,vs,am,gear,carb) ])
cyl vs am gear carb
马自达RX4 6 0 1 4 4
马自达RX4 Wag 6 0 1 4 4
Datsun 710 4 1 1 4 1
大黄蜂4驱动器6 1 0 3 1
大黄蜂Sportabout 8 0 0 3 2
Valiant 6 1 0 3 1

目前,我有两个嵌套for循环来提取那些在特定因子的10%的时间内发生的级别,并将其分配给一个新的级别名称。所以我想把这些levsl的因子分配给一个名为guz的新一级。有没有一个优雅的wqy这样做?



输出将是一个数据帧,其中对于everz因子(假设数据集中的cols是因素)那些行属于不到10次的水平属于新的水平。拿水平2在碳水化合物...它只发生一次(可以超过10%,但只是想象会是这种情况)然后只是在这个级别的这个fdactor(以及所有其他级别,这是tru的因素)进入一个新的级别名称guz。新的碳水化合物将是4,4,1,1,guz,1。



50%阈值的输出将为

  head(mtcars [c(cyl,vs,am,gear,carb)])
cyl vs am gear carb
马自达RX4 6 0 1 4 guz
马自达RX4 Wag 6 0 1 4 guz
Datsun 710 guz 1 1 4 1
大黄蜂4驱动器6 1 0 3 1
Hornet Sportabout guz 0 0 3 guz
Valiant 6 1 0 3 1


解决方案

首先让 mtcars 中的列成为明确的因素:

  cols = c(vs,am,gear,cyl,carb)
(col in cols){mtcars [,col] = factor (paste0(col,mtcars [,col]))}

现在写一个需要一个因子的函数并根据需要返回一个重新分类级别的因子。使用标签和阈值使其灵活:

  thresh_factor = function(F,thresh = 0.1,label =guz) {
n = length(F)
t = table(F)
under = t <(n * thresh)
级别(F)[under] = label
F

现在可以测试:

 > thresh_factor(因子(1:20))
[1] guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz
[20] guz
等级:guz

他们都成为 guz ,因为每个1 20是独一无二的。更多测试:

 > thresh_factor(mtcars $ carb)
[1] carb4 carb4 carb1 carb1 carb2 carb1 carb4 carb2 carb2 carb4 carb4 guz
[13] guz guz carb4 carb4 carb4 carb1 carb2 carb1 carb1 carb2 carb2 carb4
[25 ] carb2 carb1 carb2 carb2 carb4 guz guz carb2
等级:carb1 carb2 guz carb4

那里的水平已经被取代了。另一个测试:

 > thresh_factor(mtcars $ cyl)
[1] cyl6 cyl6 cyl4 cyl6 cyl8 cyl6 cyl8 cyl4 cyl4 cyl6 cyl6 cyl8 cyl8 cyl8 cyl8 cyl8 cyl8 cyl8 cyl8 cyl8
[16] cyl8 cyl8 cyl4 cyl4 cyl4 cyl4 cyl8 cyl8 cyl8 cyl8 cyl4 cyl4 cyl4 cyl8 cyl6
[31] cyl8 cyl4
级别:cyl4 cyl6 cyl8

他们被替换了。看起来不错。现在做所有的列:

 > for(col in cols){mtcars [,col] = thresh_factor(mtcars [,col])} 

只需要使用您的示例输出再次使用数字因子级别,再次使用50%阈值:

 > rm(mtcars)#start fresh 
> mtcars = head(mtcars)#前6行测试
> for(col in cols){mtcars [,col] = factor(mtcars [,col])}#将列转换为因子

现在运行我的代码:

 > for(col in cols){mtcars [,col] = thresh_factor(mtcars [,col],thresh = 0.5)} 
>头(mtcars [c(cyl,vs,am,gear,carb)])
cyl vs am gear carb
马自达RX4 6 0 1 4 guz
马自达RX4 Wag 6 0 1 4 guz
Datsun 710 guz 1 1 4 1
大黄蜂4驱动器6 1 0 3 1
大黄蜂Sportabout guz 0 0 3 guz
Valiant 6 1 0 3 1

看起来像您的预期输出。


the data frame I am working on contains many factors. Take the categorical variables from mtcars (cyl, vs, am, gear, carb).

head(mtcars[c("cyl","vs","am","gear","carb")])
                  cyl vs am gear carb
Mazda RX4           6  0  1    4    4
Mazda RX4 Wag       6  0  1    4    4
Datsun 710          4  1  1    4    1
Hornet 4 Drive      6  1  0    3    1
Hornet Sportabout   8  0  0    3    2
Valiant             6  1  0    3    1

Currently I have two nested for loops to extract those levels which occur less than in 10% of the time in the specific factor and assign it to a new level names. So I would like to assign those levsl in the factors to a new level named guz. Is there a elegant wqy to do that?

the output would be a data frame in which for everz factor (assume the cols above in the data set are factors) those rows which belong to a level that happens less than 10 of the time are ascribed to a new level guz. Take the level 2 in carb...it happens only once (okay more than 10 percent but just imagine it would be the case) then just class this level in this fdactor (and all other levels for which this is tru in the factor) into a new level names guz. The new carb colum would then be 4,4,1,1,guz,1.

the output for a 50% threshold would be

head(mtcars[c("cyl","vs","am","gear","carb")])
                  cyl vs am gear carb
Mazda RX4           6  0  1    4    guz
Mazda RX4 Wag       6  0  1    4    guz
Datsun 710          guz  1  1    4    1
Hornet 4 Drive      6  1  0    3    1
Hornet Sportabout   guz  0  0    3    guz
Valiant             6  1  0    3    1

解决方案

First lets make the columns in mtcars into clear factors:

cols = c("vs","am","gear","cyl", "carb")
for(col in cols){mtcars[,col]=factor(paste0(col,mtcars[,col]))}

Now write a function that takes a factor and returns a factor with levels reclassified as you want. Make it flexible with the label and the threshold:

thresh_factor = function(F, thresh=0.1, label="guz"){
         n=length(F)
         t=table(F)
         under=t<(n*thresh)
         levels(F)[under]=label
         F}

This can now be tested:

> thresh_factor(factor(1:20))
 [1] guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz guz
[20] guz
Levels: guz

they all become guz because each of 1:20 is unique. More tests:

> thresh_factor(mtcars$carb)
 [1] carb4 carb4 carb1 carb1 carb2 carb1 carb4 carb2 carb2 carb4 carb4 guz  
[13] guz   guz   carb4 carb4 carb4 carb1 carb2 carb1 carb1 carb2 carb2 carb4
[25] carb2 carb1 carb2 carb2 carb4 guz   guz   carb2
Levels: carb1 carb2 guz carb4

Some of the levels there have been replaced. Another test:

> thresh_factor(mtcars$cyl)
 [1] cyl6 cyl6 cyl4 cyl6 cyl8 cyl6 cyl8 cyl4 cyl4 cyl6 cyl6 cyl8 cyl8 cyl8 cyl8
[16] cyl8 cyl8 cyl4 cyl4 cyl4 cyl4 cyl8 cyl8 cyl8 cyl8 cyl4 cyl4 cyl4 cyl8 cyl6
[31] cyl8 cyl4
Levels: cyl4 cyl6 cyl8

And none of them there are replaced. Looks good. Now do over all the columns:

> for(col in cols){mtcars[,col]=thresh_factor(mtcars[,col])}

Just to test again using your sample output, with numeric factor levels, and 50% thresh:

> rm(mtcars) # start fresh
> mtcars=head(mtcars) # first 6 rows for test
> for(col in cols){mtcars[,col]=factor(mtcars[,col])} # convert columns to factors

now run my code:

> for(col in cols){mtcars[,col]=thresh_factor(mtcars[,col],thresh=0.5)}
> head(mtcars[c("cyl","vs","am","gear","carb")])
                  cyl vs am gear carb
Mazda RX4           6  0  1    4  guz
Mazda RX4 Wag       6  0  1    4  guz
Datsun 710        guz  1  1    4    1
Hornet 4 Drive      6  1  0    3    1
Hornet Sportabout guz  0  0    3  guz
Valiant             6  1  0    3    1

which looks like your expected output.

这篇关于改变因素中具体水平的名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆