R使用tidyr :: separate在最后一个空格字符处分割字符串 [英] R split string at last whitespace chars using tidyr::separate

查看:431
本文介绍了R使用tidyr :: separate在最后一个空格字符处分割字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个这样的数据框:

Suppose I have a dataframe like this:

df<-data.frame(a=c("AA","BB"),b=c("short string","this is the longer string"))



<我想根据正则表达式使用正则表达式拆分每个字符串。我尝试过的

I would like to split each string using a regex based on the last space occuring. I tried:

library(dplyr)
library(tidyr)
df%>%
  separate(b,c("partA","partB"),sep=" [^ ]*$")

但是这会省略输出中字符串的第二部分。我想要的输出看起来像这样:

But this omits the second part of the string in the output. My desired output would look like this:

   a              partA  partB
1 AA              short string
2 BB this is the longer string

我该怎么做。

推荐答案

我们可以使用提取(...))从 tidyr 中获取code>。我们匹配零个或多个字符(。* )并将其放在括号内((。*)),然后零个或多个空格( \\s + ),然后是下一个捕获组,该捕获组仅包含非空格字符( [^ ] )直到字符串的末尾( $ )。

We can use extract from tidyr by using the capture groups ((...)). We match zero or more characters (.*) and place it within the parentheses ((.*)), followed by zero or more space (\\s+), followed by the next capture group which includes only characters that are not a space ([^ ]) until the end ($) of the string.

library(tidyr)
extract(df, b, into = c('partA', 'partB'), '(.*)\\s+([^ ]+)$')
#   a              partA  partB
#1 AA              short string
#2 BB this is the longer string

这篇关于R使用tidyr :: separate在最后一个空格字符处分割字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆