在python字符串中拆分字母和数字字符的快速方法 [英] Fast way to split alpha and numeric chars in a python string
问题描述
我正在尝试制定一个简单的函数来捕获拼写错误,例如:
Westminister15"Westminister15London"23Westminister15London"
固定后:
["Westminister", "15"][威斯敏斯特"、15"、伦敦"][23"、威斯敏斯特"、15"、伦敦"]
第一次尝试:
def fixate(query):digit_pattern = re.compile(r'\D')alpha_pattern = re.compile(r'\d')数字 = 过滤器(无,digit_pattern.split(查询))alphas = filter(None, alpha_pattern.split(query))打印数字打印阿尔法
结果:
fixate("Westminister15London")>['15']>['威斯敏斯特','伦敦']
但是,我认为这可以更有效地完成,但当我尝试以下操作时仍然会得到糟糕的结果:
fixate("Westminister15London England")>['15']>['威斯敏斯特'、'伦敦英格兰']
显然它应该分别使用 London
和 England
,但我觉得我的函数会被过度修补,有一个更简单的方法
这个问题有点等价于这个 php 问题
问题在于 Python 的 re.split()
不会在零长度匹配时进行拆分.但是你可以用 re.findall()
得到想要的结果:
\d+
匹配任意数量的数字,[^\W\d_]+
匹配任意单词.
I am trying to work out a simple function to capture typos, e.g:
"Westminister15"
"Westminister15London"
"23Westminister15London"
after fixating:
["Westminister", "15"]
["Westminister", "15", "London"]
["23", "Westminister", "15", "London"]
First attempt:
def fixate(query):
digit_pattern = re.compile(r'\D')
alpha_pattern = re.compile(r'\d')
digits = filter(None, digit_pattern.split(query))
alphas = filter(None, alpha_pattern.split(query))
print digits
print alphas
result:
fixate("Westminister15London")
> ['15']
> ['Westminister', 'London']
However, I think this could be done more effectively, and I still get bad results when I try something like:
fixate("Westminister15London England")
> ['15']
> ['Westminister', 'London England']
Obviously it should enlist London
and England
separately, but I feel my function will get overly patched and theres a simpler approach
This question is somewhat equivalent to this php question
The problem is that Python's re.split()
doesn't split on zero-length matches. But you can get the desired result with re.findall()
:
>>> re.findall(r"[^\W\d_]+|\d+", "23Westminister15London")
['23', 'Westminister', '15', 'London']
>>> re.findall(r"[^\W\d_]+|\d+", "Westminister15London England")
['Westminister', '15', 'London', 'England']
\d+
matches any number of digits, [^\W\d_]+
matches any word.
这篇关于在python字符串中拆分字母和数字字符的快速方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!