Tidyr 使用正则表达式分离 [英] Tidyr Separate using regex

查看:45
本文介绍了Tidyr 使用正则表达式分离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搜索并搜索了这个并找到了类似的东西,但没有什么是正确的.希望这还没有得到回答.

I searched and searched for this and found similar stuff but nothing quite right. Hopefully this hasn't been answered.

假设我有一列包含 Y、N,有时还有额外信息

Lets say I have a column with Y,N, and sometimes extra information

    df<-data.frame(Names=c("Patient1","patient2","Patient3","Patient4","patient5"),Surgery=c("Y","N","Y-this kind of surgery","See note","Y"))

我试图将 Y 或 N 分离到一列中,并将该列中的所有其他内容分离到另一列中.

And I'm trying to separate out the Y or N into one column, and everything else from that column into another.

我试过了

    df%>%separate('Surgery',c("Surgery","Notes"), sep=" ")

最后一列有see",下一列有notes"

Will end up with a column that has "see", next column has "notes"

    df%>%separate('Surgery',c("Surgery","Notes"), sep = '^Y|^N')

有点奇怪

    df%>%separate('Surgery',c("Surgery","Notes), sep= "^[YN]?")

正确拆分音符,删除 Y 和 N.

Splits notes correctly, removes Y and N.

有人知道怎么分开吗?我正在寻找的结果将在手术列中只有 Y 或 N,而将其他任何内容推送到不同的列.

Anybody know how to separate it? The result I'm looking for would have only Y or N in the surgery column and anything else pushed to a different column.

推荐答案

我们可以使用extract from tidyr

We can use extract from tidyr

library(tidyr)
library(dplyr)
df %>% 
  extract(Surgery, into = c("Surgery", "Notes"), "^([YN]*)[[:punct:]]*(.*)")
#     Names Surgery                Notes
#1 Patient1       Y                     
#2 patient2       N                     
#3 Patient3       Y this kind of surgery
#4 Patient4                     See note
#5 patient5       Y                     

这篇关于Tidyr 使用正则表达式分离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆