R使用tidyr :: separate在最后一个空格字符处分割字符串 [英] R split string at last whitespace chars using tidyr::separate
问题描述
假设我有一个这样的数据框:
Suppose I have a dataframe like this:
df<-data.frame(a=c("AA","BB"),b=c("short string","this is the longer string"))
<我想根据正则表达式使用正则表达式拆分每个字符串。我尝试过的
:
I would like to split each string using a regex based on the last space occuring. I tried:
library(dplyr)
library(tidyr)
df%>%
separate(b,c("partA","partB"),sep=" [^ ]*$")
但是这会省略输出中字符串的第二部分。我想要的输出看起来像这样:
But this omits the second part of the string in the output. My desired output would look like this:
a partA partB
1 AA short string
2 BB this is the longer string
我该怎么做。
推荐答案
我们可以使用提取使用捕获组(
(...)
)从 tidyr
中获取code>。我们匹配零个或多个字符(。*
)并将其放在括号内((。*)
),然后零个或多个空格( \\s +
),然后是下一个捕获组,该捕获组仅包含非空格字符( [^ ]
)直到字符串的末尾( $
)。
We can use extract
from tidyr
by using the capture groups ((...)
). We match zero or more characters (.*
) and place it within the parentheses ((.*)
), followed by zero or more space (\\s+
), followed by the next capture group which includes only characters that are not a space ([^ ]
) until the end ($
) of the string.
library(tidyr)
extract(df, b, into = c('partA', 'partB'), '(.*)\\s+([^ ]+)$')
# a partA partB
#1 AA short string
#2 BB this is the longer string
这篇关于R使用tidyr :: separate在最后一个空格字符处分割字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!