将数据框中的每个x字符拆分为字符串 [英] split string each x characters in dataframe
问题描述
我知道这里有一些关于每个 nth
个字符拆分字符串的答案,例如这一个和这一个,但是这些都是针对特定问题的,并且大多与单个字符串相关,而不与多个字符串的数据帧相关。
I know there are some answers here about splitting a string every nth
character, such as this one and this one, However these are pretty question specific and mostly related to a single string and not to a data frame of multiple strings.
示例数据
df <- data.frame(id = 1:2, seq = c('ABCDEFGHI', 'ZABCDJHIA'))
看起来像这样:
id seq
1 1 ABCDEFGHI
2 2 ZABCDJHIA
分割第三个字符
我想分割每行一个字符串,每个第三个字符,这样生成的数据帧如下所示:
I want to split the string in each row every thrid character, such that the resulting data frame looks like this:
id 1 2 3
1 ABC DEF GHI
2 ZAB CDJ HIA
我尝试过的事情
在将字符串拆分为单个字符之前,我使用了 splitstackshape
,例如: df%>%cSplit('seq',sep =``,stripWhite = FALSE,type.convert = FALSE)
我希望拥有类似的功能(或者cSplit可能)
I used the splitstackshape
before to split a string on a single character, like so: df %>% cSplit('seq', sep = '', stripWhite = FALSE, type.convert = FALSE)
I would love to have a similar function (or perhaps it is possbile with cSplit) to split on every third character.
推荐答案
一个选项是分开
library(tidyverse)
df %>%
separate(seq, into = paste0("x", 1:3), sep = c(3, 6))
# id x1 x2 x3
#1 1 ABC DEF GHI
#2 2 ZAB CDJ HIA
如果我们想创建更通用的
If we want to create it more generic
n1 <- nchar(as.character(df$seq[1])) - 3
s1 <- seq(3, n1, by = 3)
nm1 <- paste0("x", seq_len(length(s1) +1))
df %>%
separate(seq, into = nm1, sep = s1)
或者使用 base R
,使用 strsplit
,通过将正则表达式环视传递到列表
然后是 rbind
列表
元素
Or using base R
, using strsplit
, split the 'seq' column for each instance of 3 characters by passing a regex lookaround into a list
and then rbind
the list
elements
df[paste0("x", 1:3)] <- do.call(rbind,
strsplit(as.character(df$seq), "(?<=.{3})", perl = TRUE))
注意:最好避免以非标准标签开头的列名,例如数字。因此,请在名称的开头加上 x
NOTE: It is better to avoid column names that start with non-standard labels such as numbers. For that reason, appended 'x' at the beginning of the names
这篇关于将数据框中的每个x字符拆分为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!