在R中分隔列 [英] Separating a column in R

查看:105
本文介绍了在R中分隔列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有简单的数据,如下面的MovieLense 1M数据文件

I have simple data like below its MovieLense 1M data files

  item_id                              title                       genres
1       1                   Toy Story (1995)  Animation|Children's|Comedy
2       2                     Jumanji (1995) Adventure|Children's|Fantasy
3       3            Grumpier Old Men (1995)               Comedy|Romance
4       4           Waiting to Exhale (1995)                 Comedy|Drama
5       5 Father of the Bride Part II (1995)                       Comedy
6       6                        Heat (1995)        Action|Crime|Thriller

我的流派列数据包含19个值。如何更改我的数据,如上图所示?

my genres column data contain 19 values. How should I change my data to shown like above sample?

genreTbl['title']
         title
1      unknown
2       Action
3    Adventure
4    Animation
5   Children's
6       Comedy
7        Crime
8  Documentary
9        Drama
10     Fantasy
11   Film-Noir
12      Horror
13     Musical
14     Mystery
15     Romance
16      Sci-Fi
17    Thriller
18         War
19     Western






我想将我的数据更改为此结构:


I want to change my data to this structure:

  item_id                                          movie_title release_date
1       1                                     Toy Story (1995)         <NA>
2       2                                     GoldenEye (1995)         <NA>
3       3                                    Four Rooms (1995)         <NA>
4       4                                    Get Shorty (1995)         <NA>
5       5                                       Copycat (1995)         <NA>
6       6 Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)         <NA>
  unknown Action Adventure Animation Children's Comedy Crime Documentary Drama
1       0      0         0         1          1      1     0           0     0
2       0      1         1         0          0      0     0           0     0
3       0      0         0         0          0      0     0           0     0
4       0      1         0         0          0      1     0           0     1
5       0      0         0         0          0      0     1           0     1
6       0      0         0         0          0      0     0           0     1
  Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western
1       0         0      0       0       0       0      0        0   0       0
2       0         0      0       0       0       0      0        1   0       0
3       0         0      0       0       0       0      0        1   0       0
4       0         0      0       0       0       0      0        0   0       0
5       0         0      0       0       0       0      0        1   0       0
6       0         0      0       0       0       0      0        0   0       0

我需要我所有的类型,如上所述,如果我的项目类型值包含所选的类型值应为1 else 0.

I need all my genres be in column just like above and if my item genre value contain selected genre value should be 1 else 0.

推荐答案

使用 splitstackshape dcast 中的 cSplit / code> from reshape2 / data.table 。通过使用 length 作为聚合函数,可以创建逻辑整数值:

Using a combination of cSplit from the splitstackshape and dcast from reshape2 / data.table. By using length as aggregate function, you create logical integer values:

library(splitstackshape)
library(reshape2)   # or library(data.table)
dcast(cSplit(mydf, "genres", sep="|", "long"),
      item_id + title ~ genres, 
      fun.aggregate = length)

其中:

   item_id                        title Action Adventure Animation Children's Comedy Crime Drama Fantasy Romance Thriller
1:       1               ToyStory(1995)      0         0         1          1      1     0     0       0       0        0
2:       2                Jumanji(1995)      0         1         0          1      0     0     0       1       0        0
3:       3         GrumpierOldMen(1995)      0         0         0          0      1     0     0       0       1        0
4:       4        WaitingtoExhale(1995)      0         0         0          0      1     0     1       0       0        0
5:       5 FatheroftheBridePartII(1995)      0         0         0          0      1     0     0       0       0        0
6:       6                   Heat(1995)      1         0         0          0      0     1     0       0       0        1






使用的数据:


Used data:

mydf <- structure(list(item_id = 1:6, title = structure(c(5L, 4L, 2L, 
6L, 1L, 3L), .Label = c("FatheroftheBridePartII(1995)", "GrumpierOldMen(1995)", 
"Heat(1995)", "Jumanji(1995)", "ToyStory(1995)", "WaitingtoExhale(1995)"
), class = "factor"), genres = structure(c(3L, 2L, 6L, 5L, 4L, 
1L), .Label = c("Action|Crime|Thriller", "Adventure|Children's|Fantasy", 
"Animation|Children's|Comedy", "Comedy", "Comedy|Drama", "Comedy|Romance"
), class = "factor")), .Names = c("item_id", "title", "genres"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6"))

这篇关于在R中分隔列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆