Tidyr 使用正则表达式分离 [英] Tidyr Separate using regex
问题描述
我搜索并搜索了这个并找到了类似的东西,但没有什么是正确的.希望这还没有得到回答.
I searched and searched for this and found similar stuff but nothing quite right. Hopefully this hasn't been answered.
假设我有一列包含 Y、N,有时还有额外信息
Lets say I have a column with Y,N, and sometimes extra information
df<-data.frame(Names=c("Patient1","patient2","Patient3","Patient4","patient5"),Surgery=c("Y","N","Y-this kind of surgery","See note","Y"))
我试图将 Y 或 N 分离到一列中,并将该列中的所有其他内容分离到另一列中.
And I'm trying to separate out the Y or N into one column, and everything else from that column into another.
我试过了
df%>%separate('Surgery',c("Surgery","Notes"), sep=" ")
最后一列有see",下一列有notes"
Will end up with a column that has "see", next column has "notes"
df%>%separate('Surgery',c("Surgery","Notes"), sep = '^Y|^N')
有点奇怪
df%>%separate('Surgery',c("Surgery","Notes), sep= "^[YN]?")
正确拆分音符,删除 Y 和 N.
Splits notes correctly, removes Y and N.
有人知道怎么分开吗?我正在寻找的结果将在手术列中只有 Y 或 N,而将其他任何内容推送到不同的列.
Anybody know how to separate it? The result I'm looking for would have only Y or N in the surgery column and anything else pushed to a different column.
推荐答案
我们可以使用extract
from tidyr
We can use extract
from tidyr
library(tidyr)
library(dplyr)
df %>%
extract(Surgery, into = c("Surgery", "Notes"), "^([YN]*)[[:punct:]]*(.*)")
# Names Surgery Notes
#1 Patient1 Y
#2 patient2 N
#3 Patient3 Y this kind of surgery
#4 Patient4 See note
#5 patient5 Y
这篇关于Tidyr 使用正则表达式分离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!