用dplyr创建一个因子变量? [英] creating a factor variable with dplyr?

查看:163
本文介绍了用dplyr创建一个因子变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个数据框,看起来像这样:

  df1 = structure(list(Name = structure :6,.Label = c(N1,N2,N3,
N4,N5,N6,N7),class =factor结构(c(4L,
4L,4L,3L,3L,2L),.Label = c(其他东西,私营盈利,4年以上,
私人4年以上,公共4年以上
),class =factor),旗舰= c(1,0,0,0,0,0) )),.Names = c(Name,
sector,flagship),row.names = c(NA,6L),class =data.frame)

我想创建一个新的因子变量Sector。我可以用很多代码来做很多事情,但我确定有一种更有效的方式。



现在这就是我在做的:

  df1 $ PublicFlag = 0 
df1 $ PublicFlag [df1 $ sector ==公开,4年以上& df1 $ flaghip == 1] = 1
df1 $ Public = 0
df1 $ Public [df1 $ sector ==公开,4年以上& df1 $ flaghip == 0] = 1
df1 $ PrivateNP = 0
df1 $ PrivateNP [df1 $ sector ==私人非营利] = 1
df1 $ Private4P = 0
df1 $ Private4P [df1 $ sector ==私营盈利,4年以上] = 1

库(重塑)
df2 = melt(df1 ,id = c(Name,sector,Flagship))
df2 = df2 [df2 $ value == 1,c(Name,sector, )]
库(plyr)
df2 = rename(df2,c(variable=Sector))

感谢您的帮助!

解决方案

你真的不需要 dplyr

  df1 $ Sector<  -  factor(ifelse(df1 $ sector = =公开,4年以上& df1 $旗舰== 1,PublicFlag,
ifelse(df1 $ sector ==公开,4年以上& df1 $旗舰== 0,Public,
ifelse(df1 $ sector ==Private non-for-profit,PrivateNP,
ifelse(df1 $ sector ==Private for-profit,4年或以上,Private4P ,NA)))))


df1

##名称行业旗舰部门
## 1 N1公开,4年以上1 PublicFlag
## 2 N2公开,4年以上0公开
## 3 N3公开,4年以上0公开
## 4 N4私人非营利组织,4年或以上0< NA>
## 5 N5私人非营利,4年或以上0< NA>
## 6 N6私人营利,4年以上0 Private4P

你如果需要,可以将 NA 替换为最终可能的因子级别


Suppose I have a data frame that looks something like this:

df1=structure(list(Name = structure(1:6, .Label = c("N1", "N2", "N3", 
                                                    "N4", "N5", "N6", "N7"), class = "factor"), sector = structure(c(4L, 
                                                                                                                     4L, 4L, 3L, 3L, 2L), .Label = c("other stuff", "Private for-profit, 4-year or above", 
                                                                                                                                                     "Private not-for-profit, 4-year or above", "Public, 4-year or above"
                                                                                                                     ), class = "factor"), flagship = c(1, 0, 0, 0, 0, 0)), .Names = c("Name", 
                                                                                                                                                                                       "sector", "flagship"), row.names = c(NA, 6L), class = "data.frame")

I want to create a new factor variable, "Sector". I can do it in a long way with many lines of code, but I'm sure there is a more efficient way.

Right now this is what I'm doing:

df1$PublicFlag=0
df1$PublicFlag[df1$sector=="Public, 4-year or above" & df1$flagship==1]=1
df1$Public=0
df1$Public[df1$sector=="Public, 4-year or above" & df1$flagship==0]=1
df1$PrivateNP=0
df1$PrivateNP[df1$sector=="Private not-for-profit"]=1
df1$Private4P=0
df1$Private4P[df1$sector=="Private for-profit, 4-year or above"]=1

library(reshape)
df2 = melt(df1, id=c("Name", "sector", "flagship"))
df2 = df2[df2$value==1,c("Name", "sector", "flagship", "variable")]
library(plyr)
df2 = rename(df2, c("variable"="Sector"))

Thanks for the help!

解决方案

You don't really even need dplyr:

df1$Sector <- factor(ifelse(df1$sector=="Public, 4-year or above" & df1$flagship==1, "PublicFlag",
                       ifelse(df1$sector=="Public, 4-year or above" & df1$flagship==0, "Public",
                         ifelse(df1$sector=="Private not-for-profit", "PrivateNP", 
                           ifelse(df1$sector=="Private for-profit, 4-year or above", "Private4P", NA)))))


df1

##   Name                                  sector flagship     Sector
## 1   N1                 Public, 4-year or above        1 PublicFlag
## 2   N2                 Public, 4-year or above        0     Public
## 3   N3                 Public, 4-year or above        0     Public
## 4   N4 Private not-for-profit, 4-year or above        0       <NA>
## 5   N5 Private not-for-profit, 4-year or above        0       <NA>
## 6   N6     Private for-profit, 4-year or above        0  Private4P

You can replace NA with the final possible factor level if you need it

这篇关于用dplyr创建一个因子变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆