将逗号分隔的列转换为带有布尔值的列 [英] Convert Comma-Separated Column to Columns with Booleans

查看:118
本文介绍了将逗号分隔的列转换为带有布尔值的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在我的data.frame的一列 services 中有以下逗号分隔的数据。

 > dput(结构(df $ services [1:5]))
list(全球支出管理,公司隐私政策,移除服务,全球支出管理,
移除服务,公司隐私政策,
移除服务,例外与成本管理,辅助服务,全球成本估算,全球支出管理,公司存储,公司隐私政策,
全球支出管理,公司隐私政策)

在我的数据帧中单独的列,如果该行包含服务,则在该服务的列下设置TRUE。



例如,如果我想我的dataframe看起来像这样:

  GlobalExpenseManagement |公司隐私权政策| etc ... 
TRUE TRUE
TRUE FALSE
FALSE TRUE

我假设我必须拆分逗号分隔值,将它们分组以删除重复项,然后将它们作为 names(df)添加到我的数据帧。但是,我不知道如何迭代数据集,如果行包含该服务,则设置true / false。



有没有人有任何好的想法?



编辑: h3>

我现在正试图将新矩阵与我现有的数据框架相结合,用新的列对象替换服务。我试过这个基于@ plafort的伟大的答案下面:

  names(df)<  -  headnames 
rbind然而,我得到这个错误:


$ b


$ b b $ b

我也试过了:

  ;  -  data.frame(cbind(mat,df))

但是, df 的列。如何将 mat 的列合并到 df

解决方案

我会考虑从我的splitstackshape包中的 cSplit_e 结果是作为二进制1和0而不是 TRUE FALSE ,但应该很容易



示例数据:

  df<  -  data .frame(services = I(
list(Global Expense Management,Company Privacy Policy,Removal Services,Global Expense Management,
Removal Services,Exception& amp;估计,公司隐私政策,
移除服务,例外与费用管理,辅助服务,全球成本估算,全球费用管理,公司隐私政策,
全球支出管理,Company Privacy Policy)))

将services列转换为向量而不是列表

  df $ services<  -  unlist(df $ services)

现在分割:

  library(splitstackshape)
cSplit_e(df,services,,,type =character,fill = 0 )
##服务
## 1全球支出管理,公司隐私政策
## 2全球支出管理移除服务
## 3移除服务,异常&成本管理,全球成本估算,公司隐私政策
## 4删除服务,异常&成本管理,辅助服务,全球成本估算,全球支出管理,公司隐私政策
## 5全球支出管理,公司隐私政策
##服务_辅助服务_公司隐私政策services_Exception&成本管理
## 1 0 1 0
## 2 0 0 0
## 3 0 1 1
## 4 1 1 1
## 5 0 1 0
## services_Global成本估算服务_全球支出管理services_Perm存储
## 1 0 1 0
## 2 0 1 0
## 3 1 0 0
## 4 1 1 1
## 5 0 1 0
## services_Removal Services
## 1 0
## 2 1
## 3 1
## 4 1
## 5 0


I have the following comma-separated data in one of my data.frame's columns called services.

> dput(structure(df$services[1:5]))
list("Global Expense Management, Company Privacy Policy", "Removal Services, Global Expense Management", 
    "Removal Services, Exception &amp; Cost Admin, Global Cost Estimate, Company Privacy Policy", 
    "Removal Services, Exception &amp; Cost Admin, Ancillary Services, Global Cost Estimate, Global Expense Management, Perm Storage, Company Privacy Policy", 
    "Global Expense Management, Company Privacy Policy")

I would like to transform this data into separate columns in my dataframe and if the row contains the service, then set TRUE under that service's column. Otherwise, set the value as FALSE.

For example, if I would like my dataframe to look like this:

GlobalExpenseManagement    |    CompanyPrivacyPolicy   |   etc...
TRUE                            TRUE
TRUE                            FALSE
FALSE                           TRUE

I assume I would have to split out the comma-sep values, group them to remove duplicates, then add them as names(df) to my dataframe. However, I don't know how to iterate over the dataset and set true/false if the row contains that service.

Does anyone have any good ideas of have to do this?

Edit: Combining the data back

I am now trying to combine the new matrix with my existing dataframe to replace the services with their new column counterparts. I have tried this based on @plafort's great answer below:

names(df) <- headnames
rbind(mat, df)

However, I get this error:

Error in names(df) <- headnames : 'names' attribute [178] must be the same length as the vector [7]

I have also tried this:

final <- data.frame(cbind(mat, df))

But, it seems to be missing the columns from df. How can I combine the columns from mat to df?

解决方案

I would consider cSplit_e from my "splitstackshape" package. The result is as a binary "1" and "0" instead of TRUE and FALSE, but that should be easy to convert.

Sample data:

df <- data.frame(services = I(
  list("Global Expense Management, Company Privacy Policy", "Removal Services, Global Expense Management", 
       "Removal Services, Exception &amp; Cost Admin, Global Cost Estimate, Company Privacy Policy", 
       "Removal Services, Exception &amp; Cost Admin, Ancillary Services, Global Cost Estimate, Global Expense Management, Perm Storage, Company Privacy Policy", 
       "Global Expense Management, Company Privacy Policy")))

Convert the "services" column to a vector instead of a list:

df$services <- unlist(df$services)

Now split it up:

library(splitstackshape)
cSplit_e(df, "services", ",", type = "character", fill = 0)
##                                                                                                                                                  services
## 1                                                                                                       Global Expense Management, Company Privacy Policy
## 2                                                                                                             Removal Services, Global Expense Management
## 3                                                              Removal Services, Exception &amp; Cost Admin, Global Cost Estimate, Company Privacy Policy
## 4 Removal Services, Exception &amp; Cost Admin, Ancillary Services, Global Cost Estimate, Global Expense Management, Perm Storage, Company Privacy Policy
## 5                                                                                                       Global Expense Management, Company Privacy Policy
##   services_Ancillary Services services_Company Privacy Policy services_Exception &amp; Cost Admin
## 1                           0                               1                                   0
## 2                           0                               0                                   0
## 3                           0                               1                                   1
## 4                           1                               1                                   1
## 5                           0                               1                                   0
##   services_Global Cost Estimate services_Global Expense Management services_Perm Storage
## 1                             0                                  1                     0
## 2                             0                                  1                     0
## 3                             1                                  0                     0
## 4                             1                                  1                     1
## 5                             0                                  1                     0
##   services_Removal Services
## 1                         0
## 2                         1
## 3                         1
## 4                         1
## 5                         0

这篇关于将逗号分隔的列转换为带有布尔值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆