python拆分多个分隔符错误? [英] python split on multiple delimiters bug?

查看:59
本文介绍了python拆分多个分隔符错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看对这个早先提出的问题的答复:

当您需要在 Python re 模式中放置文字连字符时,请放置:

  • 开头:[-A-Z](匹配一个大写的ASCII字母和-)
  • 结尾:[AZ()-](匹配一个大写的ASCII字母,(, )-)
  • 在有效范围之后:[A-Z-+](匹配大写 ASCII 字母,-+)
  • 或者只是逃避它.

你不能把它放在速记之后,就在独立符号之前(如在 [\w-+] 中,它会导致 bad character range 错误).这在 .NET 和其他一些正则表达式中有效,但在 Python re 中无效.

将连字符放在它的末尾,或者将其转义.

使用

re.split(r"[^a-zA-Z0-9_'/-]+", b)

在 Python 2.7 中,您甚至可以将其收缩为

re.split(r"[^\w'/-]+", b)

I was looking at the responses to this earlier-asked question:

Split Strings with Multiple Delimiters?

For my variant of this problem, I wanted to split on everything that wasn't from a specific set of chars. Which led me to a solution I liked, until I found this apparent bug. Is this a bug or some quirk of python I'm unfamiliar with?

>>> b = "Which_of'these-markers/does,it:choose to;split!on?"
>>> b1 = re.split("[^a-zA-Z0-9_'-/]+", b)
>>> b1
["Which_of'these-markers/does,it", 'choose', 'to', 'split', 'on', '']

I'm not understanding why it doesn't split on a comma (','), given that a comma is not in my exception list?

解决方案

The '-/ inside a character class created a range that includes a comma:

When you need to put a literal hyphen in a Python re pattern, put it:

  • at the start: [-A-Z] (matches an uppercase ASCII letter and -)
  • at the end: [A-Z()-] (matches an uppercase ASCII letter, (, ) or -)
  • after a valid range: [A-Z-+] (matches an uppercase ASCII letter, - or +)
  • or just escape it.

You cannot put it after a shorthand, right before a standalone symbol (as in [\w-+], it will cause a bad character range error). This is valid in .NET and some other regex flavors, but is not valid in Python re.

Put the hyphen at the end of it, or escape it.

Use

re.split(r"[^a-zA-Z0-9_'/-]+", b)

In Python 2.7, you may even contract it to

re.split(r"[^\w'/-]+", b)

这篇关于python拆分多个分隔符错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆