的Python:转排序,重复值的单个数组到数组的数组? [英] Python: turn single array of sorted, repeat values into an array of arrays?

查看:152
本文介绍了的Python:转排序,重复值的单个数组到数组的数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些重复的值的排序的数组。如何解决这个阵列变成一个数组的数组与值分组子阵(见下文)?实际上,my_first_array有〜800万项,因此该解决方案将preferably随着时间尽可能高效。

  my_first_array = [1,1,1,3,5,5,9,9,9,9,9,10,23,23]wanted_array = [[1,1,1],[3],[5,5],[9,9,9,9,9],[10],[23,23]


解决方案

itertools.groupby 使得这个简单

 导入和itertoolswanted_array = [列表(GRP)为_,GRP在itertools.groupby(my_first_array)

由于没有的功能,它只是收益取值由相同值的运行组,所以你列表化每一个列表中的COM prehension;十分简单。你可以把它看作是一个基本上在Python的API做了GNU工具包项目的工作, uniq的和相关业务。

在CPython中(参考间preTER), GROUPBY 用C实现的,它懒洋洋地和线性操作;数据必须已经出现在运行匹配的功能,所以整理可能会使它太昂贵,但对于已经被排序数据,如你有,没有什么会更有效

请注意:如果输入可能值相同,但不同的对象,它可能是有意义的内存原因改变列表(GRP)为_,GRP [K] * LEN(列表(GRP))为K,GRP 。在最终结果前者将保留原来的(可能值但未同一性重复)对象,后者将复制从各组的第一个对象,而不是,减少每个组的最终成本的低成本的N 单个对象的引用,而不是 N 1 和<$ C $ ç> N 的对象。

I have a sorted array with some repeated values. How can this array be turned into an array of arrays with the subarrays grouped by value (see below)? In actuality, my_first_array has ~8 million entries, so the solution would preferably be as time efficient as possible.

my_first_array = [1,1,1,3,5,5,9,9,9,9,9,10,23,23]

wanted_array = [ [1,1,1], [3], [5,5], [9,9,9,9,9], [10], [23,23] ]

解决方案

itertools.groupby makes this trivial:

import itertools

wanted_array = [list(grp) for _, grp in itertools.groupby(my_first_array)]

With no key function, it just yields groups consisting of runs of identical values, so you list-ify each one in a list comprehension; easy-peasy. You can think of it as basically a within-Python API for doing the work of the GNU toolkit program, uniq, and related operations.

In CPython (the reference interpreter), groupby is implemented in C, and it operates lazily and linearly; the data must already appear in runs matching the key function, so sorting might make it too expensive, but for already sorted data like you have, there is nothing that will be more efficient.

Note: If the inputs might be value identical, but different objects, it may make sense for memory reasons to change list(grp) for _, grp to [k] * len(list(grp)) for k, grp. The former would retain the original (possibly value but not identity duplicate) objects in the final result, the latter would replicate the first object from each group instead, reducing the final cost per group to the cost of N references to a single object, instead of N references to between 1 and N objects.

这篇关于的Python:转排序,重复值的单个数组到数组的数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆