将数据框中的每个x字符拆分为字符串 [英] split string each x characters in dataframe

查看:73
本文介绍了将数据框中的每个x字符拆分为字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这里有一些关于每个 nth 个字符拆分字符串的答案,例如这一个这一个,但是这些都是针对特定问题的,并且大多与单个字符串相关,而不与多个字符串的数据帧相关。

I know there are some answers here about splitting a string every nth character, such as this one and this one, However these are pretty question specific and mostly related to a single string and not to a data frame of multiple strings.

示例数据

df <- data.frame(id = 1:2, seq = c('ABCDEFGHI', 'ZABCDJHIA'))

看起来像这样:

  id       seq
1  1 ABCDEFGHI
2  2 ZABCDJHIA

分割第三个字符

我想分割每行一个字符串,每个第三个字符,这样生成的数据帧如下所示:

I want to split the string in each row every thrid character, such that the resulting data frame looks like this:

id  1   2   3
1   ABC DEF GHI
2   ZAB CDJ HIA

我尝试过的事情

在将字符串拆分为单个字符之前,我使用了 splitstackshape ,例如: df%>%cSplit('seq',sep =``,stripWhite = FALSE,type.convert = FALSE)我希望拥有类似的功能(或者cSplit可能)

I used the splitstackshape before to split a string on a single character, like so: df %>% cSplit('seq', sep = '', stripWhite = FALSE, type.convert = FALSE) I would love to have a similar function (or perhaps it is possbile with cSplit) to split on every third character.

推荐答案

一个选项是分开

library(tidyverse)
df %>%
    separate(seq, into = paste0("x", 1:3), sep = c(3, 6))
# id  x1  x2  x3
#1  1 ABC DEF GHI
#2  2 ZAB CDJ HIA

如果我们想创建更通用的

If we want to create it more generic

n1 <- nchar(as.character(df$seq[1])) - 3
s1 <- seq(3, n1, by = 3)
nm1 <- paste0("x", seq_len(length(s1) +1))
df %>% 
    separate(seq, into = nm1, sep = s1)






或者使用 base R ,使用 strsplit ,通过将正则表达式环视传递到列表然后是 rbind 列表元素


Or using base R, using strsplit, split the 'seq' column for each instance of 3 characters by passing a regex lookaround into a list and then rbind the list elements

df[paste0("x", 1:3)] <- do.call(rbind, 
           strsplit(as.character(df$seq), "(?<=.{3})", perl = TRUE))

注意:最好避免以非标准标签开头的列名,例如数字。因此,请在名称的开头加上 x

NOTE: It is better to avoid column names that start with non-standard labels such as numbers. For that reason, appended 'x' at the beginning of the names

这篇关于将数据框中的每个x字符拆分为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆