将列拆分为多列 [英] Splitting column into multi-columns

查看:86
本文介绍了将列拆分为多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于以下给出的数据,

data1<-structure(list(var1 = c("2 7", "2 6 7", "2 7", "2 7", "1 7", 
"1 7", "1 5", "1 2 7", "1 5", "1 7", "1 2 3 4 5 6 7", "1 2 4 6"
)), .Names = "var1", class = "data.frame", row.names = c(NA, 
-12L))

> data1

            var1
1            2 7
2          2 6 7
3            2 7
4            2 7
5            1 7
6            1 7
7            1 5
8          1 2 7
9            1 5
10           1 7
11 1 2 3 4 5 6 7
12       1 2 4 6

我希望将其分为以下七列(7):

I would like it to split into seven columns (7) as follows:

    v1  v2  v3  v4  v5  v6  v7
1   NA  2   NA  NA  NA  NA  7
2   NA  2   NA  NA  NA  6   7
3   NA  2   NA  NA  NA  NA  7
4   NA  2   NA  NA  NA  NA  7
5   1   NA  NA  NA  NA  NA  7
6   1   NA  NA  NA  NA  NA  7
7   1   NA  NA  NA  5   NA  NA
8   1   2   NA  NA  NA  NA  7
9   1   NA  NA  NA  5   NA  NA
10  1   NA  NA  NA  NA  NA  7
11  1   2   3   4   5   6   7
12  1   2   NA  4   NA  6   NA

我使用 data.table 包中的 tstrsplit 如下:

library(data.table)
setDT(data1)[,tstrsplit(var1," ")]



 V1 V2 V3 V4 V5 V6 V7
 1:  2  7 NA NA NA NA NA
 2:  2  6  7 NA NA NA NA
 3:  2  7 NA NA NA NA NA
 4:  2  7 NA NA NA NA NA
 5:  1  7 NA NA NA NA NA
 6:  1  7 NA NA NA NA NA
 7:  1  5 NA NA NA NA NA
 8:  1  2  7 NA NA NA NA
 9:  1  5 NA NA NA NA NA
10:  1  7 NA NA NA NA NA
11:  1  2  3  4  5  6  7
12:  1  2  4  6 NA NA NA

与预期的输出不同。我想知道如何如上所述获得预期的输出。

This is different than the expected output. I was wondering how can I get the expected output as described above.

推荐答案

使用 data.table 您可以尝试

library(magrittr)
setDT(data1)[, strsplit(var1," "), by = .(rn = seq_len(nrow(data1)))] %>% 
  dcast(., rn ~ V1)




       rn      1      2      3      4      5      6      7
 1:     1     NA      2     NA     NA     NA     NA      7
 2:     2     NA      2     NA     NA     NA      6      7
 3:     3     NA      2     NA     NA     NA     NA      7
 4:     4     NA      2     NA     NA     NA     NA      7
 5:     5      1     NA     NA     NA     NA     NA      7
 6:     6      1     NA     NA     NA     NA     NA      7
 7:     7      1     NA     NA     NA      5     NA     NA
 8:     8      1      2     NA     NA     NA     NA      7
 9:     9      1     NA     NA     NA      5     NA     NA
10:    10      1     NA     NA     NA     NA     NA      7
11:    11      1      2      3      4      5      6      7
12:    12      1      2     NA      4     NA      6     NA


摆脱 rn 列,我们可以使用

setDT(data1)[, strsplit(var1," "), by = .(rn = 1:nrow(data1))][
  , dcast(.SD, rn ~ V1)][, rn := NULL][]



说明



Explanation

setDT(data1)[, strsplit(var1," "), by = .(rn = seq_len(nrow(data1)))]

直接在其中创建data.table长格式

creates a data.table directly in long format


    rn V1
 1:  1  2
 2:  1  7
 3:  2  2
 4:  2  6
 5:  2  7
 6:  3  2
 7:  3  7
 8:  4  2
 9:  4  7
10:  5  1
11:  5  7
12:  6  1
13:  6  7
14:  7  1
15:  7  5
16:  8  1
17:  8  2
18:  8  7
19:  9  1
20:  9  5
21: 10  1
22: 10  7
23: 11  1
24: 11  2
25: 11  3
26: 11  4
27: 11  5
28: 11  6
29: 11  7
30: 12  1
31: 12  2
32: 12  4
33: 12  6
    rn V1


,然后使用 dcast()将其重整为宽格式。

which is then reshaped to wide format using dcast().

如果我们使用 tstrsplit()而不是 strsplit(),我们将获得一个需要改写为宽格式的data.table使用 melt()的长格式:

If we would use tstrsplit() instead of strsplit() we would get a data.table in wide format which needs to be reshaped to long format using melt():

setDT(data1)[,tstrsplit(var1," ")][, rn := .I][
  , melt(.SD, id = "rn", na.rm = TRUE)][
    , dcast(.SD, rn ~ paste0("V", value))][
      , rn := NULL][]

这篇关于将列拆分为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆