什么是切片集合的pythononic方法? [英] What is pythononic way of slicing a set?

查看:34
本文介绍了什么是切片集合的pythononic方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据列表,例如

I have some list of data, for example

some_data = [1, 2, 4, 1, 6, 23, 3, 56, 6, 2, 3, 5, 6, 32, 2, 12, 5, 3, 2]

我想获得固定长度的唯一值(我不在乎我会得到哪个),我也希望它是 set 对象.

and i want to get unique values with fixed length(i don't care which i will get) and i also want it to be set object.

我知道我可以从 some_dataset 然后让它list,裁剪它然后让它set再次.

I know that i can do set from some_data then make it list, crop it and then make it set again.

set(list(set(some_data))[:5])  # don't look so friendly

我知道我在 set 中没有 __getitem__ 方法,这不会使整个切片成为可能,但如果有机会让它看起来更好?

I understand that i don't have __getitem__ method in set which wouldn't make the whole slice thing possible, but if there is a chance to make it look better?

而且我完全理解 set 是无序的.因此,最终 set 中将包含哪些元素并不重要.

And i completely understand that set is unordered. So it don't matter which elements will get in final set.

可能的选项是:

set(dict(map(lambda x: (x, None), some_data)).keys()[:2])  # not that great

推荐答案

集合是可迭代的.如果您真的不关心您的集合中的哪些项目被选中,您可以使用 itertools.islice 来获取一个迭代器,该迭代器将产生指定数量的项目(无论哪个按迭代顺序排在第一位).将迭代器传递给 set 构造函数,您就可以在不使用任何额外列表的情况下获得子集:

Sets are iterable. If you really don't care which items from your set are selected, you can use itertools.islice to get an iterator that will yield a specified number of items (whichever ones come first in the iteration order). Pass the iterator to the set constructor and you've got your subset without using any extra lists:

import itertools

some_data = [1, 2, 4, 1, 6, 23, 3, 56, 6, 2, 3, 5, 6, 32, 2, 12, 5, 3, 2]
big_set = set(some_data)
small_set = set(itertools.islice(big_set, 5))

虽然这是您所要求的,但我不确定您是否真的应该使用它.集合可能会以非常确定的顺序进行迭代,因此如果您的数据通常包含许多相似的值,那么每次执行此操作时您可能最终都会选择一个非常相似的子集.当数据由整数组成时(如示例中所示),这尤其糟糕,这些整数会自行散列.在迭代集合时,连续整数将非常频繁地按顺序出现.上面的代码中,只有32big_set中是乱序的(使用Python 3.5),所以small_set{32, 1, 2, 3, 4}.如果您将 0 添加到您的数据中,即使数据集变得庞大,您几乎总是以 {0, 1, 2, 3, 4} 结束,因为这些值将始终填满集合哈希表中的前五个位置.

While this is what you've asked for, I'm not sure you should really use it. Sets may iterate in a very deterministic order, so if your data often contains many similar values, you may end up selecting a very similar subset every time you do this. This is especially bad when the data consists of integers (as in the example), which hash to themselves. Consecutive integers will very frequently appear in order when iterating a set. With the code above, only 32 is out of order in big_set (using Python 3.5), so small_set is {32, 1, 2, 3, 4}. If you added 0 to the your data, you'd almost always end up with {0, 1, 2, 3, 4} even if the dataset grew huge, since those values will always fill up the first fives slots in the set's hash table.

为了避免这种确定性抽样,您可以使用 random.sample jprockbelly 建议的.

To avoid such deterministic sampling, you can use random.sample as suggested by jprockbelly.

这篇关于什么是切片集合的pythononic方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆