的Python:转排序,重复值的单个数组到数组的数组? [英] Python: turn single array of sorted, repeat values into an array of arrays?
问题描述
我有一些重复的值的排序的数组。如何解决这个阵列变成一个数组的数组与值分组子阵(见下文)?实际上,my_first_array有〜800万项,因此该解决方案将preferably随着时间尽可能高效。
my_first_array = [1,1,1,3,5,5,9,9,9,9,9,10,23,23]wanted_array = [[1,1,1],[3],[5,5],[9,9,9,9,9],[10],[23,23]
导入和itertoolswanted_array = [列表(GRP)为_,GRP在itertools.groupby(my_first_array)
由于没有键
的功能,它只是收益
取值由相同值的运行组,所以你列表
化每一个列表中的COM prehension;十分简单。你可以把它看作是一个基本上在Python的API做了GNU工具包项目的工作, uniq的
和相关业务。
在CPython中(参考间preTER), GROUPBY
用C实现的,它懒洋洋地和线性操作;数据必须已经出现在运行匹配键
的功能,所以整理可能会使它太昂贵,但对于已经被排序数据,如你有,没有什么会更有效
请注意:如果输入可能值相同,但不同的对象,它可能是有意义的内存原因改变列表(GRP)为_,GRP
到 [K] * LEN(列表(GRP))为K,GRP
。在最终结果前者将保留原来的(可能值但未同一性重复)对象,后者将复制从各组的第一个对象,而不是,减少每个组的最终成本的低成本的N
单个对象的引用,而不是 N
1
和<$ C $ ç> N 的对象。
I have a sorted array with some repeated values. How can this array be turned into an array of arrays with the subarrays grouped by value (see below)? In actuality, my_first_array has ~8 million entries, so the solution would preferably be as time efficient as possible.
my_first_array = [1,1,1,3,5,5,9,9,9,9,9,10,23,23]
wanted_array = [ [1,1,1], [3], [5,5], [9,9,9,9,9], [10], [23,23] ]
itertools.groupby
makes this trivial:
import itertools
wanted_array = [list(grp) for _, grp in itertools.groupby(my_first_array)]
With no key
function, it just yield
s groups consisting of runs of identical values, so you list
-ify each one in a list comprehension; easy-peasy. You can think of it as basically a within-Python API for doing the work of the GNU toolkit program, uniq
, and related operations.
In CPython (the reference interpreter), groupby
is implemented in C, and it operates lazily and linearly; the data must already appear in runs matching the key
function, so sorting might make it too expensive, but for already sorted data like you have, there is nothing that will be more efficient.
Note: If the inputs might be value identical, but different objects, it may make sense for memory reasons to change list(grp) for _, grp
to [k] * len(list(grp)) for k, grp
. The former would retain the original (possibly value but not identity duplicate) objects in the final result, the latter would replicate the first object from each group instead, reducing the final cost per group to the cost of N
references to a single object, instead of N
references to between 1
and N
objects.
这篇关于的Python:转排序,重复值的单个数组到数组的数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!