对列表中元素的子列表进行排序,其余元素保留在原位 [英] Sort a sublist of elements in a list leaving the rest in place
问题描述
说我有一个排序的字符串列表,如下所示:
Say I have a sorted list of strings as in:
['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
现在,我想基于B
s的尾随数值进行排序-所以我有:
Now I want to sort based on the trailing numerical value for the B
s - so I have:
['A', 'B' , 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
一种可能的算法是散列像regex = re.compile(ur'(B)(\d*))
这样的正则表达式,找到第一个和最后一个B
的索引,对列表进行切片,使用正则表达式的第二组对切片进行排序,然后插入已排序的切片.但是,这似乎太麻烦了.如果不与正则表达式匹配并且仅与正则表达式不匹配,是否有办法编写将项目留在适当位置"的关键功能
对匹配的项目(子列表)进行排序?
One possible algorithm would be to hash up a regex like regex = re.compile(ur'(B)(\d*))
, find the indices of the first and last B
, slice the list, sort the slice using the regex's second group, then insert the sorted slice. However this seems too much of a hassle. Is there a way to write a key function that "leaves the item in place" if it does not match the regex and only
sorts the items (sublists) that match ?
注意:以上只是一个例子;我不一定知道模式(或者我可能也想对C进行排序,或者对其中有尾随数字的任何字符串进行排序).理想情况下,我正在寻找一种解决一般问题的方法,该方法是仅对符合给定条件的子序列进行排序(否则,仅对那些满足给定前缀的特定条件的子序列,然后是一串数字进行排序).
Note: the above is just an example; I don't necessarily know the pattern (or I may want to also sort C's, or any string that has a trailing number in there). Ideally, I'm looking for an approach to the general problem of sorting only subsequences which match a given criterion (or failing that, just those that meet the specific criterion of a given prefix followed by a string of digits).
推荐答案
在简单的情况下,您只想按数字对尾随数字及其非数字前缀进行排序,则需要一个键函数,该函数将每个项目分为非数字和数字成分如下:
In the simple case where you just want to sort trailing digits numerically and their non-digit prefixes alphabetically, you need a key function which splits each item into non-digit and digit components as follows:
'AB123' -> ['AB', 123]
'CD' -> ['CD']
'456' -> ['', 456]
注意::在最后一种情况下,在CPython 2.x中空字符串
''
并非绝对必要,因为整数在字符串之前排序–但这是实现细节,而不是语言的保证,在Python 3.x中, 是必需的,因为根本无法比较字符串和整数.
Note: In the last case, the empty string
''
is not strictly necessary in CPython 2.x, as integers sort before strings – but that's an implementation detail rather than a guarantee of the language, and in Python 3.x it is necessary, because strings and integers can't be compared at all.
You can build such a key function using a list comprehension and re.split()
:
import re
def trailing_digits(x):
return [
int(g) if g.isdigit() else g
for g in re.split(r'(\d+)$', x)
]
这里正在起作用:
>>> s1 = ['11', '2', 'A', 'B', 'B1', 'B11', 'B2', 'B21', 'C', 'C11', 'C2']
>>> sorted(s1, key=trailing_digits)
['2', '11', 'A', 'B', 'B1', 'B2', 'B11', 'B21', 'C', 'C2', 'C11']
一旦添加了限制,即只有具有特定前缀或多个前缀的字符串的尾随数字都必须按数字排序,那么事情就会变得更加复杂.
Once you add the restriction that only strings with a particular prefix or prefixes have their trailing digits sorted numerically, things get a little more complicated.
以下函数将构建并返回满足要求的关键函数:
The following function builds and returns a key function which fulfils the requirement:
def prefixed_digits(*prefixes):
disjunction = '|'.join('^' + re.escape(p) for p in prefixes)
pattern = re.compile(r'(?<=%s)(\d+)$' % disjunction)
def key(x):
return [
int(g) if g.isdigit() else g
for g in re.split(pattern, x)
]
return key
此处的主要区别在于,将创建一个预编译的正则表达式(包含从提供的一个或多个前缀构造的后视),并返回使用该正则表达式的键函数.
The main difference here is that a precompiled regex is created (containing a lookbehind constructed from the supplied prefix or prefixes), and a key function using that regex is returned.
以下是一些用法示例:
>>> s2 = ['A', 'B', 'B11', 'B2', 'B21', 'C', 'C11', 'C2', 'D12', 'D2']
>>> sorted(s2, key=prefixed_digits('B'))
['A', 'B', 'B2', 'B11', 'B21', 'C', 'C11', 'C2', 'D12', 'D2']
>>> sorted(s2, key=prefixed_digits('B', 'C'))
['A', 'B', 'B2', 'B11', 'B21', 'C', 'C2', 'C11', 'D12', 'D2']
>>> sorted(s2, key=prefixed_digits('B', 'D'))
['A', 'B', 'B2', 'B11', 'B21', 'C', 'C11', 'C2', 'D2', 'D12']
如果不带任何参数调用,则prefixed_digits()
返回的键函数的行为与trailing_digits
相同:
If called with no arguments, prefixed_digits()
returns a key function which behaves identically to trailing_digits
:
>>> sorted(s1, key=prefixed_digits())
['2', '11', 'A', 'B', 'B1', 'B2', 'B11', 'B21', 'C', 'C2', 'C11']
注意事项:
-
由于Python的
re
模块中有关lookbhehind语法的限制,多个前缀必须具有相同的长度.
Due to a restriction in Python's
re
module regarding lookbhehind syntax, multiple prefixes must have the same length.
在Python 2.x中,无论为prefixed_digits()
提供哪些前缀,纯数字字符串都将按数字排序.在Python 3中,它们会引起异常(除非在不带参数的情况下调用,否则在key=prefixed_digits('')
的特殊情况下-它将按数字对纯数字字符串进行排序,并按字母顺序对字符串进行前缀排序).使用更复杂的正则表达式可以解决此问题,但是大约二十分钟后我放弃了尝试.
In Python 2.x, strings which are purely numeric will be sorted numerically regardless of which prefixes are supplied to prefixed_digits()
. In Python 3, they'll cause an exception (except when called with no arguments, or in the special case of key=prefixed_digits('')
– which will sort purely numeric strings numerically, and prefixed strings alphabetically). Fixing that may be possible with a significantly more complex regex, but I gave up trying after about twenty minutes.
这篇关于对列表中元素的子列表进行排序,其余元素保留在原位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!