在巨大的字典上排序和反转? [英] Sorted and reversed on huge dict ?

查看:103
本文介绍了在巨大的字典上排序和反转?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,


i想要排序(ed)和反向(d)许多巨大的

词典的结果(单个字典将包含〜 150000个条目)。键

是单词,值是计数(整数)。


我想知道我是否可以在内存中有10个,或者如果我应该

一个接一个地继续。


但我更感兴趣的是保存论文作为价值,按键排序和

逆转(即最频繁的第一次),我可以用unix

命令排序,但我想知道我应该如何用python做内存

友好。


可以通过以下方式完成:


来自itertools import izip

对= izip(d.itervalues( ),d.iterkeys())

for v,k in reverse(sorted(pairs)):

print k,v


或者它与构建整个列表是一样的吗?

Hello,

i would like to sort(ed) and reverse(d) the result of many huge
dictionaries (a single dictionary will contain ~ 150000 entries). Keys
are words, values are count (integer).

i''m wondering if i can have a 10s of these in memory, or if i should
proceed one after the other.

but moreover i''m interested in saving theses as values, keys sorted and
reversed (ie most frequent first), i can do it with sort from unix
command but i wonder how i should do it with python to be memory
friendly.

can it be done by using :

from itertools import izip
pairs = izip(d.itervalues(), d.iterkeys())
for v, k in reversed(sorted(pairs)):
print k, v

or will it be the same as building the whole list ?

推荐答案

vd ***** @ yahoo.fr 写道:

i想要排序(编辑)和反转(d)许多巨大的b
字典的结果ies(单个字典将包含~150000个条目)。键

是单词,值是计数(整数)。


我想知道我是否可以在内存中有10个,
i would like to sort(ed) and reverse(d) the result of many huge
dictionaries (a single dictionary will contain ~ 150000 entries). Keys
are words, values are count (integer).

i''m wondering if i can have a 10s of these in memory,



取决于您的机器有多少内存。

Depends on how much memory your machine has.


或者我应该一个接一个地继续。
or if i should proceed one after the other.



显然,如果你能这样做的话,那就更友好了。

Obviously that''s more memory friendly if you can do it that way.

来自itertools的
import izip

pairs = izip(d.itervalues(),d.iterkeys())

for v,k in reversed(sorted(pairs)):

打印k,v


或者它是否与构建整个列表相同?
from itertools import izip
pairs = izip(d.itervalues(), d.iterkeys())
for v, k in reversed(sorted(pairs)):
print k, v

or will it be the same as building the whole list ?



我认为以上情况非常好。排序必然构建和

返回一个列表,但itervalues / iterkeys,izip和reverse,只需

构建小迭代器对象。


如果列表真的很大,你上面的下一步是

可能要使用外部排序,但150000个条目不是很多,

,无论如何排序是对可用内存的一种压力,然后在内存中拥有

dicts也可能是一种压力。也许你应该开始考虑在外部存储dict内容,例如数据库中的

I think the above is pretty good. sorted necessarily builds and
returns a list, but itervalues/iterkeys, izip, and reversed, just
build small iterator objects.

If the lists are really large, your next step after the above is
probably to use an external sort, but 150000 entries is not that many,
and anyway if sorting is a strain on available memory, then having the
dicts in memory at all will probably also be a strain. Maybe you
should start looking into storing the dict contents externally, such
as in a database.


< a href =mailto:vd ***** @ yahoo.fr> vd ***** @ yahoo.fr 写道:
vd*****@yahoo.fr wrote:

i想要排序(编辑)和反转(d)许多巨大的

词典的结果(一个词典将包含~150000个词条)。键

是单词,值是count(整数)。
i would like to sort(ed) and reverse(d) the result of many huge
dictionaries (a single dictionary will contain ~ 150000 entries). Keys
are words, values are count (integer).



不确定150k条目是否有资格作为巨大的,除非你缺少

内存。

not sure 150k entries qualify as huge, though, unless you''re short on
memory.


我想知道我是否可以在内存中有10个这样的内容,或者如果我应该一个接一个地继续进行

i''m wondering if i can have a 10s of these in memory, or if i should
proceed one after the other.



为什么不试试呢?

why not just try it out?


而且我还有兴趣将论文保存为值,键排序和

颠倒(即最频繁的第一个),我可以用unix

命令排序,但我想知道我应该如何用python做它内存

友好。


可以通过以下方式完成:


来自itertools import izip

对= izip(d.itervalues(),d.iterkeys())

for v,k in reverse(sorted(pairs)):

print k,v


或者它是否与构建整个列表相同?
but moreover i''m interested in saving theses as values, keys sorted and
reversed (ie most frequent first), i can do it with sort from unix
command but i wonder how i should do it with python to be memory
friendly.

can it be done by using :

from itertools import izip
pairs = izip(d.itervalues(), d.iterkeys())
for v, k in reversed(sorted(pairs)):
print k, v

or will it be the same as building the whole list ?



sorted()需要访问所有数据,所以它会建立一个列表,即使你给b $ b喂它一个生成器。你必须自己测试一下,但我怀疑

最有效的方法可以做你想做的事情:


items = d。 items()

items.sort(key = operator.itemgetter(1),reverse = True)


项目列表需要几兆字节150k字典

条目,左右。关键地图也需要一些记忆,但

排序的其余部分已经完成。


< / F>

sorted() needs access to all data, so it''ll build a list, even if you
feed it a generator. you will have to test it yourself, but I suspect
that the most memory-efficient way to do what you want could be:

items = d.items()
items.sort(key=operator.itemgetter(1), reverse=True)

the items list would require a couple of megabytes for 150k dictionary
entries, or so. the key map needs some memory too, but the rest of the
sort is done in place.

</F>


Fredrik Lundh< fr ***** @ pythonware.comwrites:
Fredrik Lundh <fr*****@pythonware.comwrites:

items = d.items( )

items.sort(key = operator.itemgetter(1),reverse = True)


项目列表需要几兆字节才能获得150k字典

条目,左右。键映射也需要一些内存,但其余的

排序就完成了。
items = d.items()
items.sort(key=operator.itemgetter(1), reverse=True)

the items list would require a couple of megabytes for 150k dictionary
entries, or so. the key map needs some memory too, but the rest of
the sort is done in place.



我认为OP的方法避免了关键地图。

I think the OP''s method avoided the key map.


这篇关于在巨大的字典上排序和反转?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆