在R中分隔列 [英] Separating a column in R
问题描述
我有简单的数据,如下面的MovieLense 1M数据文件
I have simple data like below its MovieLense 1M data files
item_id title genres
1 1 Toy Story (1995) Animation|Children's|Comedy
2 2 Jumanji (1995) Adventure|Children's|Fantasy
3 3 Grumpier Old Men (1995) Comedy|Romance
4 4 Waiting to Exhale (1995) Comedy|Drama
5 5 Father of the Bride Part II (1995) Comedy
6 6 Heat (1995) Action|Crime|Thriller
我的流派
列数据包含19个值。如何更改我的数据,如上图所示?
my genres
column data contain 19 values. How should I change my data to shown like above sample?
genreTbl['title']
title
1 unknown
2 Action
3 Adventure
4 Animation
5 Children's
6 Comedy
7 Crime
8 Documentary
9 Drama
10 Fantasy
11 Film-Noir
12 Horror
13 Musical
14 Mystery
15 Romance
16 Sci-Fi
17 Thriller
18 War
19 Western
我想将我的数据更改为此结构:
I want to change my data to this structure:
item_id movie_title release_date
1 1 Toy Story (1995) <NA>
2 2 GoldenEye (1995) <NA>
3 3 Four Rooms (1995) <NA>
4 4 Get Shorty (1995) <NA>
5 5 Copycat (1995) <NA>
6 6 Shanghai Triad (Yao a yao yao dao waipo qiao) (1995) <NA>
unknown Action Adventure Animation Children's Comedy Crime Documentary Drama
1 0 0 0 1 1 1 0 0 0
2 0 1 1 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 1 0 0 0 1 0 0 1
5 0 0 0 0 0 0 1 0 1
6 0 0 0 0 0 0 0 0 1
Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 1 0 0
3 0 0 0 0 0 0 0 1 0 0
4 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 1 0 0
6 0 0 0 0 0 0 0 0 0 0
我需要我所有的类型,如上所述,如果我的项目类型值包含所选的类型值应为1 else 0.
I need all my genres be in column just like above and if my item genre value contain selected genre value should be 1 else 0.
推荐答案
使用 splitstackshape 和 dcast 中的
cSplit
/ code> from reshape2 / data.table 。通过使用 length
作为聚合函数,可以创建逻辑整数值:
Using a combination of cSplit
from the splitstackshape and dcast
from reshape2 / data.table. By using length
as aggregate function, you create logical integer values:
library(splitstackshape)
library(reshape2) # or library(data.table)
dcast(cSplit(mydf, "genres", sep="|", "long"),
item_id + title ~ genres,
fun.aggregate = length)
其中:
item_id title Action Adventure Animation Children's Comedy Crime Drama Fantasy Romance Thriller
1: 1 ToyStory(1995) 0 0 1 1 1 0 0 0 0 0
2: 2 Jumanji(1995) 0 1 0 1 0 0 0 1 0 0
3: 3 GrumpierOldMen(1995) 0 0 0 0 1 0 0 0 1 0
4: 4 WaitingtoExhale(1995) 0 0 0 0 1 0 1 0 0 0
5: 5 FatheroftheBridePartII(1995) 0 0 0 0 1 0 0 0 0 0
6: 6 Heat(1995) 1 0 0 0 0 1 0 0 0 1
使用的数据:
Used data:
mydf <- structure(list(item_id = 1:6, title = structure(c(5L, 4L, 2L,
6L, 1L, 3L), .Label = c("FatheroftheBridePartII(1995)", "GrumpierOldMen(1995)",
"Heat(1995)", "Jumanji(1995)", "ToyStory(1995)", "WaitingtoExhale(1995)"
), class = "factor"), genres = structure(c(3L, 2L, 6L, 5L, 4L,
1L), .Label = c("Action|Crime|Thriller", "Adventure|Children's|Fantasy",
"Animation|Children's|Comedy", "Comedy", "Comedy|Drama", "Comedy|Romance"
), class = "factor")), .Names = c("item_id", "title", "genres"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6"))
这篇关于在R中分隔列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!