创建带有多个分隔符的两列 [英] Create two column with multiple separators

查看:90
本文介绍了创建带有多个分隔符的两列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,例如

  COl1 
scaffold_97606_2-BACs _-__ SP1_1
UELV01165908.1_2-BACs _ + __ SP2_2
UXGC01046554.1_9-702 _ + __ SP3_3
scaffold_12002_1087-1579 _-__ SP4_4

我想将它们分为两列并得到:

  COL1 COL2 
scaffold_97606 2-BACs _-__ SP1_1
UELV01165908.1 2-BACs_ + __ SP2_2
UXGC01046554.1 9-702 _ + __ SP3_3
scaffold_12002 1087-1579 _-__ SP4_4

so如您所见,分隔符更改可能是 .Number _ Number_Number


到目前为止,我写过;

  df2<-df1%>%
独立的(COL1,paste0('col',1 :2),sep =分隔符模式,extra =合并)

但我不知道知道在此处应使用什么分隔符分隔符样式 $ b


解决方案

您可以使用

 > df1%&%;%
独立(COl1,paste0('col',1:2),sep =(?< = \\d)_(?= \\d +-) ,extra = merge)
col1 col2
1 scaffold_97606 2-BACs _-__ SP1_1
2 UELV01165908.1 2-BACs _ + __ SP2_2
3 UXGC01046554.1 9- 702 _ + __ SP3_3
4 scaffold_12002 1087-1579 _-__ SP4_4

请参见 regex演示


模式详细信息



  • (?< = \d)-向后看是正数,需要立即在当前位置的左侧输入数字

  • _ -下划线

  • (?= \d +- )-一个正向的超前查询,需要一位或多位数字,然后紧接当前位置右侧的-


I have a dataframe such as

COl1
scaffold_97606_2-BACs_-__SP1_1
UELV01165908.1_2-BACs_+__SP2_2
UXGC01046554.1_9-702_+__SP3_3
scaffold_12002_1087-1579_-__SP4_4

and I would like to separate both into two columns and get :

COL1           COL2 
scaffold_97606 2-BACs_-__SP1_1
UELV01165908.1 2-BACs_+__SP2_2
UXGC01046554.1 9-702_+__SP3_3
scaffold_12002 1087-1579_-__SP4_4

so as you can see the separator changes it can be .Number_ or Number_Number

So far I wrote ;

df2 <- df1 %>%
    separate(COL1, paste0('col', 1:2), sep = " the separator patterns ", extra = "merge")

but I do not know what separator I should use here in the " the separator patterns "part

解决方案

You may use

> df1 %>%
    separate(COl1, paste0('col', 1:2), sep = "(?<=\\d)_(?=\\d+-)", extra = "merge")
            col1               col2
1 scaffold_97606    2-BACs_-__SP1_1
2 UELV01165908.1    2-BACs_+__SP2_2
3 UXGC01046554.1     9-702_+__SP3_3
4 scaffold_12002 1087-1579_-__SP4_4

See the regex demo

Pattern details

  • (?<=\d) - a positive lookbehind that requires a digit immediately to the left of the current location
  • _ - an underscore
  • (?=\d+-) - a positive lookahead that requires one or more digits and then a - immediately to the right of the current location.

这篇关于创建带有多个分隔符的两列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆