python拆分多个分隔符错误? [英] python split on multiple delimiters bug?
问题描述
我正在查看对这个早先提出的问题的答复:
当您需要在 Python re
模式中放置文字连字符时,请放置:
- 开头:
[-A-Z]
(匹配一个大写的ASCII字母和-
) - 结尾:
[AZ()-]
(匹配一个大写的ASCII字母,(
,)
或-
) - 在有效范围之后:
[A-Z-+]
(匹配大写 ASCII 字母,-
或+
) - 或者只是逃避它.
你不能把它放在速记之后,就在独立符号之前(如在 [\w-+]
中,它会导致 bad character range 错误).这在 .NET 和其他一些正则表达式中有效,但在 Python re
中无效.
将连字符放在它的末尾,或者将其转义.
使用
re.split(r"[^a-zA-Z0-9_'/-]+", b)
在 Python 2.7 中,您甚至可以将其收缩为
re.split(r"[^\w'/-]+", b)
I was looking at the responses to this earlier-asked question:
Split Strings with Multiple Delimiters?
For my variant of this problem, I wanted to split on everything that wasn't from a specific set of chars. Which led me to a solution I liked, until I found this apparent bug. Is this a bug or some quirk of python I'm unfamiliar with?
>>> b = "Which_of'these-markers/does,it:choose to;split!on?"
>>> b1 = re.split("[^a-zA-Z0-9_'-/]+", b)
>>> b1
["Which_of'these-markers/does,it", 'choose', 'to', 'split', 'on', '']
I'm not understanding why it doesn't split on a comma (','), given that a comma is not in my exception list?
The '-/
inside a character class created a range that includes a comma:
When you need to put a literal hyphen in a Python re
pattern, put it:
- at the start:
[-A-Z]
(matches an uppercase ASCII letter and-
) - at the end:
[A-Z()-]
(matches an uppercase ASCII letter,(
,)
or-
) - after a valid range:
[A-Z-+]
(matches an uppercase ASCII letter,-
or+
) - or just escape it.
You cannot put it after a shorthand, right before a standalone symbol (as in [\w-+]
, it will cause a bad character range error). This is valid in .NET and some other regex flavors, but is not valid in Python re
.
Put the hyphen at the end of it, or escape it.
Use
re.split(r"[^a-zA-Z0-9_'/-]+", b)
In Python 2.7, you may even contract it to
re.split(r"[^\w'/-]+", b)
这篇关于python拆分多个分隔符错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!