用“*"分隔第一个数字的字符串;在字符串中 [英] separate character string at first digit with "*" in the string

查看:64
本文介绍了用“*"分隔第一个数字的字符串;在字符串中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为这是一个简单的方法,但我看不到我遗漏了什么.我想在第一个数字处拆分字符串.在字符串中有非字母数字符号之前效果很好.帮助!

This is an easy one I think but I cannot see what I'm missing. I want to split the string at the first digit. Works great until there is a non-alphanumeric symbol in the string. Help!

作品:

pet<-c("Dog 100","Cat? 340")
df<-as.data.frame(pet)
df_split<-separate(df, pet, into = c("Animal", "Total"), sep = "(?<=[a-zA-Z])\\s*(?=[0-9])")

第一行效果很好,但第二行没有拆分.我哪里出错了?

The first line works great but the second line does not split. Where am I going wrong?

推荐答案

注意对于当前场景,用 1+ 个空格分割,后面跟 1+ 个数字到结尾就足够了字符串:

Note that for the current scenario, it is enough to split with 1+ whitespaces that are followed with 1+ digits to the end of the string:

> separate(df, pet, into = c("Animal", "Total"), sep = "\\s+(?=[0-9]+$)")
## =>  Animal Total
## =>    1    Dog   100
## =>    2   Cat?   340

请参阅正则表达式演示.

但是,在一般情况下,在这里使用 tidyr::extract 会容易得多,因为您需要的模式会更简单:

However, in a general case, it is much easier to use tidyr::extract here since the pattern you need will be miuch simpler:

^(\D*?)\s*(\d.*)

请注意,如果您的字符串可以有换行符,您需要在模式前面加上 (?s),这是一个允许 . 匹配的所谓 DOTALL 修饰符ICU 模式中的换行符.

Note that if your strings can have newlines, you will need to prepend the pattern with (?s), a so-called DOTALL modifier that allows . to match line break chars in an ICU pattern.

请参阅正则表达式演示.

正则表达式详情

  • ^ - 字符串的开始
  • (\D*?) - 第 1 组(此处为 Animal 列):任何 0+ 非数字符号,尽可能少
  • \s* - 0 个或多个空格
  • (\d.*) - 第 2 组(此处为 Total 列):一个数字后跟任意 0+ 个字符(如果 (?s) 不使用),尽可能多(* 是一个贪婪的量词).
  • ^ - start of string
  • (\D*?) - Group 1 (here, Animal column): any 0+ non-digit symbols, as few as possible
  • \s* - 0 or more whitespaces
  • (\d.*) - Group 2 (here, Total column): a digit followed with any 0+ chars (other than line break chars if (?s) is not used), as many as possible (* is a greedy quantifier).

R 代码片段:

library(tidyr)
df_split<-extract(df, pet, into = c("Animal", "Total"), regex="(\\D*)(\\d.*)")
df_split
# =>   Animal Total
# => 1   Dog    100
# => 2  Cat?    340

这篇关于用“*"分隔第一个数字的字符串;在字符串中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆