将列拆分为多列 [英] Splitting column into multi-columns
本文介绍了将列拆分为多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
对于以下给出的数据,
data1<-structure(list(var1 = c("2 7", "2 6 7", "2 7", "2 7", "1 7",
"1 7", "1 5", "1 2 7", "1 5", "1 7", "1 2 3 4 5 6 7", "1 2 4 6"
)), .Names = "var1", class = "data.frame", row.names = c(NA,
-12L))
> data1
var1
1 2 7
2 2 6 7
3 2 7
4 2 7
5 1 7
6 1 7
7 1 5
8 1 2 7
9 1 5
10 1 7
11 1 2 3 4 5 6 7
12 1 2 4 6
我希望将其分为以下七列(7):
I would like it to split into seven columns (7) as follows:
v1 v2 v3 v4 v5 v6 v7
1 NA 2 NA NA NA NA 7
2 NA 2 NA NA NA 6 7
3 NA 2 NA NA NA NA 7
4 NA 2 NA NA NA NA 7
5 1 NA NA NA NA NA 7
6 1 NA NA NA NA NA 7
7 1 NA NA NA 5 NA NA
8 1 2 NA NA NA NA 7
9 1 NA NA NA 5 NA NA
10 1 NA NA NA NA NA 7
11 1 2 3 4 5 6 7
12 1 2 NA 4 NA 6 NA
我使用 data.table
包中的 tstrsplit
如下:
library(data.table)
setDT(data1)[,tstrsplit(var1," ")]
V1 V2 V3 V4 V5 V6 V7
1: 2 7 NA NA NA NA NA
2: 2 6 7 NA NA NA NA
3: 2 7 NA NA NA NA NA
4: 2 7 NA NA NA NA NA
5: 1 7 NA NA NA NA NA
6: 1 7 NA NA NA NA NA
7: 1 5 NA NA NA NA NA
8: 1 2 7 NA NA NA NA
9: 1 5 NA NA NA NA NA
10: 1 7 NA NA NA NA NA
11: 1 2 3 4 5 6 7
12: 1 2 4 6 NA NA NA
与预期的输出不同。我想知道如何如上所述获得预期的输出。
This is different than the expected output. I was wondering how can I get the expected output as described above.
推荐答案
使用 data.table
您可以尝试
library(magrittr)
setDT(data1)[, strsplit(var1," "), by = .(rn = seq_len(nrow(data1)))] %>%
dcast(., rn ~ V1)
rn 1 2 3 4 5 6 7
1: 1 NA 2 NA NA NA NA 7
2: 2 NA 2 NA NA NA 6 7
3: 3 NA 2 NA NA NA NA 7
4: 4 NA 2 NA NA NA NA 7
5: 5 1 NA NA NA NA NA 7
6: 6 1 NA NA NA NA NA 7
7: 7 1 NA NA NA 5 NA NA
8: 8 1 2 NA NA NA NA 7
9: 9 1 NA NA NA 5 NA NA
10: 10 1 NA NA NA NA NA 7
11: 11 1 2 3 4 5 6 7
12: 12 1 2 NA 4 NA 6 NA
摆脱 rn
列,我们可以使用
setDT(data1)[, strsplit(var1," "), by = .(rn = 1:nrow(data1))][
, dcast(.SD, rn ~ V1)][, rn := NULL][]
说明
Explanation
setDT(data1)[, strsplit(var1," "), by = .(rn = seq_len(nrow(data1)))]
直接在其中创建data.table长格式
creates a data.table directly in long format
rn V1
1: 1 2
2: 1 7
3: 2 2
4: 2 6
5: 2 7
6: 3 2
7: 3 7
8: 4 2
9: 4 7
10: 5 1
11: 5 7
12: 6 1
13: 6 7
14: 7 1
15: 7 5
16: 8 1
17: 8 2
18: 8 7
19: 9 1
20: 9 5
21: 10 1
22: 10 7
23: 11 1
24: 11 2
25: 11 3
26: 11 4
27: 11 5
28: 11 6
29: 11 7
30: 12 1
31: 12 2
32: 12 4
33: 12 6
rn V1
,然后使用 dcast()
将其重整为宽格式。
which is then reshaped to wide format using dcast()
.
如果我们使用 tstrsplit()
而不是 strsplit()
,我们将获得一个需要改写为宽格式的data.table使用 melt()
的长格式:
If we would use tstrsplit()
instead of strsplit()
we would get a data.table in wide format which needs to be reshaped to long format using melt()
:
setDT(data1)[,tstrsplit(var1," ")][, rn := .I][
, melt(.SD, id = "rn", na.rm = TRUE)][
, dcast(.SD, rn ~ paste0("V", value))][
, rn := NULL][]
这篇关于将列拆分为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文