用dplyr创建一个因子变量? [英] creating a factor variable with dplyr?
本文介绍了用dplyr创建一个因子变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
df1 = structure(list(Name = structure :6,.Label = c(N1,N2,N3,
N4,N5,N6,N7),class =factor结构(c(4L,
4L,4L,3L,3L,2L),.Label = c(其他东西,私营盈利,4年以上,
私人4年以上,公共4年以上
),class =factor),旗舰= c(1,0,0,0,0,0) )),.Names = c(Name,
sector,flagship),row.names = c(NA,6L),class =data.frame)
我想创建一个新的因子变量Sector。我可以用很多代码来做很多事情,但我确定有一种更有效的方式。
现在这就是我在做的:
df1 $ PublicFlag = 0
df1 $ PublicFlag [df1 $ sector ==公开,4年以上& df1 $ flaghip == 1] = 1
df1 $ Public = 0
df1 $ Public [df1 $ sector ==公开,4年以上& df1 $ flaghip == 0] = 1
df1 $ PrivateNP = 0
df1 $ PrivateNP [df1 $ sector ==私人非营利] = 1
df1 $ Private4P = 0
df1 $ Private4P [df1 $ sector ==私营盈利,4年以上] = 1
库(重塑)
df2 = melt(df1 ,id = c(Name,sector,Flagship))
df2 = df2 [df2 $ value == 1,c(Name,sector, )]
库(plyr)
df2 = rename(df2,c(variable=Sector))
感谢您的帮助!
解决方案
你真的不需要 dplyr
:
df1 $ Sector< - factor(ifelse(df1 $ sector = =公开,4年以上& df1 $旗舰== 1,PublicFlag,
ifelse(df1 $ sector ==公开,4年以上& df1 $旗舰== 0,Public,
ifelse(df1 $ sector ==Private non-for-profit,PrivateNP,
ifelse(df1 $ sector ==Private for-profit,4年或以上,Private4P ,NA)))))
df1
##名称行业旗舰部门
## 1 N1公开,4年以上1 PublicFlag
## 2 N2公开,4年以上0公开
## 3 N3公开,4年以上0公开
## 4 N4私人非营利组织,4年或以上0< NA>
## 5 N5私人非营利,4年或以上0< NA>
## 6 N6私人营利,4年以上0 Private4P
你如果需要,可以将 NA
替换为最终可能的因子级别
Suppose I have a data frame that looks something like this:
df1=structure(list(Name = structure(1:6, .Label = c("N1", "N2", "N3",
"N4", "N5", "N6", "N7"), class = "factor"), sector = structure(c(4L,
4L, 4L, 3L, 3L, 2L), .Label = c("other stuff", "Private for-profit, 4-year or above",
"Private not-for-profit, 4-year or above", "Public, 4-year or above"
), class = "factor"), flagship = c(1, 0, 0, 0, 0, 0)), .Names = c("Name",
"sector", "flagship"), row.names = c(NA, 6L), class = "data.frame")
I want to create a new factor variable, "Sector". I can do it in a long way with many lines of code, but I'm sure there is a more efficient way.
Right now this is what I'm doing:
df1$PublicFlag=0
df1$PublicFlag[df1$sector=="Public, 4-year or above" & df1$flagship==1]=1
df1$Public=0
df1$Public[df1$sector=="Public, 4-year or above" & df1$flagship==0]=1
df1$PrivateNP=0
df1$PrivateNP[df1$sector=="Private not-for-profit"]=1
df1$Private4P=0
df1$Private4P[df1$sector=="Private for-profit, 4-year or above"]=1
library(reshape)
df2 = melt(df1, id=c("Name", "sector", "flagship"))
df2 = df2[df2$value==1,c("Name", "sector", "flagship", "variable")]
library(plyr)
df2 = rename(df2, c("variable"="Sector"))
Thanks for the help!
解决方案
You don't really even need dplyr
:
df1$Sector <- factor(ifelse(df1$sector=="Public, 4-year or above" & df1$flagship==1, "PublicFlag",
ifelse(df1$sector=="Public, 4-year or above" & df1$flagship==0, "Public",
ifelse(df1$sector=="Private not-for-profit", "PrivateNP",
ifelse(df1$sector=="Private for-profit, 4-year or above", "Private4P", NA)))))
df1
## Name sector flagship Sector
## 1 N1 Public, 4-year or above 1 PublicFlag
## 2 N2 Public, 4-year or above 0 Public
## 3 N3 Public, 4-year or above 0 Public
## 4 N4 Private not-for-profit, 4-year or above 0 <NA>
## 5 N5 Private not-for-profit, 4-year or above 0 <NA>
## 6 N6 Private for-profit, 4-year or above 0 Private4P
You can replace NA
with the final possible factor level if you need it
这篇关于用dplyr创建一个因子变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文