使用tidyr将字符串长度不均匀的行拆分为R中的列 [英] Splitting rows with uneven string length into columns in R using tidyr

查看:47
本文介绍了使用tidyr将字符串长度不均匀的行拆分为R中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这被标记为重复.它不是.这里的问题不仅是将单列拆分为多列,因为我的单独代码可以工作.我的问题的要点是当行字符串具有不同长度的列输出时拆分列.

This was marked as a duplicate. It is not. The question here is not only about splitting a single column into multiple ones, as my separate code would had worked. The main point of my question is splitting the column when the row string possess varying lengths of column output.

我正在尝试改变这个:

data <- c("Place1-Place2-Place2-Place4-Place2-Place3-Place5",
          "Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7",
          "Place1-Place1-Place1-Place1-Place3-Place5",
          "Place1-Place4-Place2-Place3-Place3-Place5-Place5",
          "Place6-Place6",
          "Place1-Place2-Place3-Place4")

进入这个:

      X1     X2     X3     X4     X5     X6     X7     X8
1 Place1 Place2 Place2 Place4 Place2 Place3 Place5 
2 Place7 Place7 Place7 Place7 Place7 Place7 Place7 Place7
3 Place1 Place1 Place1 Place1 Place3 Place5 
4 Place1 Place4 Place2 Place3 Place3 Place5 Place5 
5 Place6 Place6 
6 Place1 Place2 Place3 Place4

我尝试使用以下代码使用 tidyr 的单独函数:

I tried to use tidyr's seperate function using this code:

library(data.table)
data <- as.data.table(data)
data_table <- tidyr::separate(data,
                            data,
                            sep="-",
                            into = strsplit(data$data, "-"),
                            fill = "right")

遗憾的是我收到此错误:

Sadly I'm getting this error:

Warning message:
Too many values at 3 locations: 1, 2, 4 

我需要更改什么才能使其正常工作?

What do I need to change to make it work?

推荐答案

您正确指定了目标列:

library(tidyr)
separate(DF, V1, paste0("X",1:8), sep="-")

给出:

      X1     X2     X3     X4     X5     X6     X7     X8
1 Place1 Place2 Place2 Place4 Place2 Place3 Place5   <NA>
2 Place7 Place7 Place7 Place7 Place7 Place7 Place7 Place7
3 Place1 Place1 Place1 Place1 Place3 Place5   <NA>   <NA>
4 Place1 Place4 Place2 Place3 Place3 Place5 Place5   <NA>
5 Place6 Place6   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>
6 Place1 Place2 Place3 Place4   <NA>   <NA>   <NA>   <NA>

如果事先不知道需要多少个目标列,可以使用:

If you don't know how many target columns you need beforehand, you can use:

> max(sapply(strsplit(as.character(DF$V1),'-'),length))
[1] 8

提取最大数量的部分(这就是您需要的列数).

to extract the maximum number of parts (which is thus the number of columns you need).

其他几种方法:

splitstackshape :

library(splitstackshape)
cSplit(DF, "V1", sep="-", direction = "wide")

stringi :

library(stringi)
as.data.frame(stri_list2matrix(stri_split_fixed(DF$V1, "-"), byrow = TRUE))

数据表:

library(data.table)
setDT(DF)[, paste0("v", 1:8) := tstrsplit(V1, "-")][, V1 := NULL][]

stringr :

library(stringr)
as.data.frame(str_split_fixed(DF$V1, "-",8))

它们都给出了相似的结果.

which all give a similar result.

使用的数据:

DF <- data.frame(V1=c("Place1-Place2-Place2-Place4-Place2-Place3-Place5",
                      "Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7",
                      "Place1-Place1-Place1-Place1-Place3-Place5",
                      "Place1-Place4-Place2-Place3-Place3-Place5-Place5",
                      "Place6-Place6",
                      "Place1-Place2-Place3-Place4"))

这篇关于使用tidyr将字符串长度不均匀的行拆分为R中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆