在Python中,我如何自然地对字母数字字符串列表进行排序,以使字母字符排在数字字符之前? [英] In Python, how can I naturally sort a list of alphanumeric strings such that alpha characters sort ahead of numeric characters?
问题描述
这是我最近遇到的一个有趣的小挑战.我将在下面提供我的答案,但我很想知道是否有更优雅或更有效的解决方案.
This is a fun little challenge that confronted me recently. I'll provide my answer below, but I'm curious to see whether there are more elegant or efficient solutions.
对提出给我的要求的描述:
A delineation of the requirements as they were presented to me:
- 字符串是字母数字(请参见下面的测试数据集)
- 字符串应自然排序(有关说明,请参见此问题)
- 字母字符应排在数字字符之前(即"abc"在"100"之前)
- alpha字符的大写实例应排在小写实例(即'ABc','Abc','abc')之前
- Strings are alphanumeric (see test dataset below)
- Strings should be sorted naturally (see this question for explanation)
- Alpha characters should be sorted ahead of numeric characters (i.e. 'abc' before '100')
- Uppercase instances of alpha chars should be sorted ahead of lowercase instances (i.e. 'ABc', 'Abc', 'abc')
这是一个测试数据集:
test_cases = [
# (unsorted list, sorted list)
(list('bca'), ['a', 'b', 'c']),
(list('CbA'), ['A', 'b', 'C']),
(list('r0B9a'), ['a', 'B', 'r', '0', '9']),
(['a2', '1a', '10a', 'a1', 'a100'], ['a1', 'a2', 'a100', '1a', '10a']),
(['GAM', 'alp2', 'ALP11', '1', 'alp100', 'alp10', '100', 'alp1', '2'],
['alp1', 'alp2', 'alp10', 'ALP11', 'alp100', 'GAM', '1', '2', '100']),
(list('ra0b9A'), ['A', 'a', 'b', 'r', '0', '9']),
(['Abc', 'abc', 'ABc'], ['ABc', 'Abc', 'abc']),
]
奖励测试用例
This is inspired by Janne Karila's comment below that the selected answer currently fails (but wouldn't really be a practical concern in my case):
(['0A', '00a', 'a', 'A', 'A0', '00A', '0', 'a0', '00', '0a'],
['A', 'a', 'A0', 'a0', '0', '00', '0A', '00A', '0a', '00a'])
推荐答案
re_natural = re.compile('[0-9]+|[^0-9]+')
def natural_key(s):
return [(1, int(c)) if c.isdigit() else (0, c.lower()) for c in re_natural.findall(s)] + [s]
for case in test_cases:
print case[1]
print sorted(case[0], key=natural_key)
['a', 'b', 'c']
['a', 'b', 'c']
['A', 'b', 'C']
['A', 'b', 'C']
['a', 'B', 'r', '0', '9']
['a', 'B', 'r', '0', '9']
['a1', 'a2', 'a100', '1a', '10a']
['a1', 'a2', 'a100', '1a', '10a']
['alp1', 'alp2', 'alp10', 'ALP11', 'alp100', 'GAM', '1', '2', '100']
['alp1', 'alp2', 'alp10', 'ALP11', 'alp100', 'GAM', '1', '2', '100']
['A', 'a', 'b', 'r', '0', '9']
['A', 'a', 'b', 'r', '0', '9']
['ABc', 'Abc', 'abc']
['ABc', 'Abc', 'abc']
我决定重新考虑这个问题,看看是否有可能处理奖金案.它要求在钥匙的决胜局部分上更加复杂.为了匹配期望的结果,必须在数字部分之前考虑键的字母部分.我还在键的自然部分和决胜局之间添加了一个标记,以使短键始终位于长键之前.
I decided to revisit this question and see if it would be possible to handle the bonus case. It requires being more sophisticated in the tie-breaker portion of the key. To match the desired results, the alpha parts of the key must be considered before the numeric parts. I also added a marker between the natural section of the key and the tie-breaker so that short keys always come before long ones.
def natural_key2(s):
parts = re_natural.findall(s)
natural = [(1, int(c)) if c.isdigit() else (0, c.lower()) for c in parts]
ties_alpha = [c for c in parts if not c.isdigit()]
ties_numeric = [c for c in parts if c.isdigit()]
return natural + [(-1,)] + ties_alpha + ties_numeric
对于上面的测试案例,这将产生相同的结果,再加上奖励案例,将产生所需的输出:
This generates identical results for the test cases above, plus the desired output for the bonus case:
['A', 'a', 'A0', 'a0', '0', '00', '0A', '00A', '0a', '00a']
这篇关于在Python中,我如何自然地对字母数字字符串列表进行排序,以使字母字符排在数字字符之前?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!