列表,元组和记忆。 [英] Lists, tuples and memory.

查看:68
本文介绍了列表,元组和记忆。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好!


这是问题所在:

我有一个文件,其中包含一个常用字典 - 每行一个字

(约700KB和70000字)。我必须在内存中阅读它以备将来

拼写检查来自客户的话。该文件是

预分类。所以这就是:


lstdict = map(lambda x:x.lower()。strip(),

file(" D:\ \ CommonDictionary.txt"))


像魅力一样工作。它需要我的机器0.7秒才能完成这个技巧

和python.exe(来自任务管理器数据)之前正在使用这行是

执行

2636K,之后是5520K。差异是2884K。没那么糟糕,考虑到在C中我会读取内存中的文件(700K)扫描

CR,计算它们,用''\\替换\\ 0''并在计算CR时分配我找到的大小的单词beginnigs的指数向量

。在这个

的特殊情况下,索引向量几乎是300K。到目前为止一直这么好!


然后我意识到,作为一个列表的lstdict是一种矫枉过正。在我的情况下,元组足够
。所以我修改了代码:


t = tuple(文件(&D:\\CommonDictionary.txt"))

lstdict = map(lambda x:x.lower()。strip(),t)


此代码工作速度稍快一点:0.5秒,但需要5550K内存。

也许这是可以理解的:毕竟第一行创建了一个

列表和一个元组,另一个是另一个元组(所有相同大小)。


但后来我决定比较这些作品:


t =元组(文件(D:\\CommonDictionary.txt))#1

lstdict = map(lambda x:x.lower()。strip(),t)#2


lstdict = map(lambda x:x.lower()。strip(),

文件(" D:\\CommonDictionary.txt"))#3


正如预期的那样,第2行内存为5550K,但之后第3行它跳了

到7996K !!!


问题:


如果使用参考计数,为什么第二次分配到lstdict

(第3行)没有释放由第一个(第2行)分配的内存

并重复使用它?


再来一个实验:


t =元组(文件(&D:\\CommonDictionary.txt"))#1

lstdict = map(lambda x:x .lower()。strip(),t)#2

del lstdict#3

lstdict = map(lambda x:x.lower()。strip(),

文件(" D:\\CommonDictionary.txt"))#4


在这种情况下,执行第4行没有添加内存!


顺便说一下,按功能完成搜索速度非常好:


def inDict(lstdict,word):

尝试:

返回lstdict [bisect(lstdict,word) - 1] == word

除外:

返回错误


在不到0.03秒的时间内测试500字。

Hi, all!

Here is the problem:
I have a file, which contains a common dictionary - one word per line
(appr. 700KB and 70000 words). I have to read it in memory for future
"spell checking" of the words comming from the customer. The file is
presorted. So here it goes:

lstdict = map(lambda x: x.lower().strip(),
file("D:\\CommonDictionary.txt"))

Works like a charm. It takes on my machine 0.7 seconds to do the trick
and python.exe (from task manager data) was using before this line is
executed
2636K, and after 5520K. The difference is 2884K. Not that bad, taking
into account that in C I''d read the file in memory (700K) scan for
CRs, count them, replace with ''\0'' and allocate the index vector of
the word beginnigs of the size I found while counting CRs. In this
particular case index vector would be almost 300K. So far so good!

Then I realized, that lstdict as a list is an overkill. Tuple is
enough in my case. So I modified the code:

t = tuple(file("D:\\CommonDictionary.txt"))
lstdict = map(lambda x: x.lower().strip(), t)

This code works a little bit faster: 0.5 sec, but takes 5550K memory.
And maybe this is understandable: after all the first line creates a
list and a tuple and the second another tuple (all of the same size).

But then I decieded to compare this pieces:

t = tuple(file("D:\\CommonDictionary.txt")) # 1
lstdict = map(lambda x: x.lower().strip(), t) # 2

lstdict = map(lambda x: x.lower().strip(),
file("D:\\CommonDictionary.txt")) # 3

As expected, after line 2 memory was 5550K, but after line 3 it jumped
to 7996K!!!

The question:

If refference counting is used, why the second assignment to lstdict
(line 3) did not free the memory alocated by the first one (line 2)
and reuse it?

So one more experiment:

t = tuple(file("D:\\CommonDictionary.txt")) # 1
lstdict = map(lambda x: x.lower().strip(), t) # 2
del lstdict # 3
lstdict = map(lambda x: x.lower().strip(),
file("D:\\CommonDictionary.txt")) # 4

In this case executing line 4 did not add memory!

By the way, the search speed is very-very good when done by function:

def inDict(lstdict, word):
try:
return lstdict[bisect(lstdict, word) - 1] == word
except:
return False

500 words are tested in less then 0.03 seconds.

推荐答案

2004年7月15日,Elbert列夫写道:
On 15 Jul 2004, Elbert Lev wrote:
然后我决定比较这些部分:

t =元组(文件(D:\\CommonDictionary.txt))#1
lstdict = map( lambda x:x.lower()。strip(),t)#2

lstdict = map(lambda x:x.lower()。strip(),
file(" D:\\CommonDictionary.txt"))#3

正如预期的那样,在第2行内存为5550K之后,但在第3行之后它又跳到了7996K !!!

问题:

如果使用refference计数,为什么第二次分配lstdict
(第3行)没有释放第一个分配的内存(第2行)
并重复使用它?
But then I decieded to compare this pieces:

t = tuple(file("D:\\CommonDictionary.txt")) # 1
lstdict = map(lambda x: x.lower().strip(), t) # 2

lstdict = map(lambda x: x.lower().strip(),
file("D:\\CommonDictionary.txt")) # 3

As expected, after line 2 memory was 5550K, but after line 3 it jumped
to 7996K!!!

The question:

If refference counting is used, why the second assignment to lstdict
(line 3) did not free the memory alocated by the first one (line 2)
and reuse it?




因为在第2行之后,你仍然有一个2MB的元组坐在''t''。 ;)


试试这个:


t = tuple(文件(" D:\\CommonDictionary.txt"))#1

lstdict = map(lambda x:x.lower()。strip(),t)

del t#2


lstdict = map(lambda x:x.lower()。strip(),

file(" D:\\CommonDictionary.txt"))#3


即使你的代码中没有更多的访问权限,Python也不知道你将来不会试图访问某些

的未来点。 />

关于你在做什么的其他一些想法:


1)坚持列表。元组意味着不是真正用作不可变的

列表,而是用于对不同类型的相关项进行分组。例如

序列''apple'',''orange'',''pear'',''banana''应该存储在一个列表中,

而一组相关的项目''apple'',''red'',3,5(大概

用某种预定义的方式描述一个苹果,比如说,

水果,颜色,宽度,高度)应存储在元组中。至少那个是Guido希望我们做的事情。 ;)


2)假设您使用的是较新版本的Python,请尝试使用列表

comprehension而不是map()。它有点干净,并且

也可能更快一点:


lstdict = [x.lower(。。strip()for文件中的x(D:\\CommonDictionary.txt)]


如果您以前从未使用过它们,列表推导是新的

语法用于替换map(),filter()和其他此类构造与

统一语法。它们通常很容易读写(因为它们可以帮助你避免使用lambda)。



Because after line 2, you still have a 2MB tuple sitting around in ''t''. ;)

Try this:

t = tuple(file("D:\\CommonDictionary.txt")) # 1
lstdict = map(lambda x: x.lower().strip(), t)
del t # 2

lstdict = map(lambda x: x.lower().strip(),
file("D:\\CommonDictionary.txt")) # 3

Python doesn''t know that you''re not going to try to access t at some
future point, even though there are no more accesses in your code.

A couple of other thoughts on what you''re doing:

1) Stick with lists. Tuples are meant to be used not really as immutable
lists, but as a way to group related items of different types. e.g. the
sequence ''apple'',''orange'',''pear'',''banana'' should be stored in a list,
whereas the group of related items ''apple'',''red'',3,5 (presumably
describing an apple in some predefined manner, say,
fruit,color,width,height) should be stored in a tuple. At least that''s
what Guido wants us to do. ;)

2) Assuming you''re using a newer version of Python, try using a list
comprehension instead of map(). It''s a little bit cleaner, and will
probably be a bit faster too:

lstdict = [x.lower().strip() for x in file("D:\\CommonDictionary.txt")]

If you''ve never worked with them before, list comprehensions are a new
syntax intended to replace map(), filter(), and other such constructs with
one unified syntax. They''re usually very easy to read and write (as the
one above is) since they help you avoid using lambda.


有什么理由不将这些单词放入字典?


然后你的代码变成:


lstdict = dict([(x .lower()。strip(),None)for x in

file(" D:\\CommonDictionary.txt")])

if lstdict。 has_key(字):

做点什么


(未测试,但应该关闭)


我'打赌访问速度更快(即使加载速度稍慢)。

您还可以从字典文件

不需要保持排序这一事实中受益。


HTH,

Larry Bates

Syscon,Inc。


" ; Elbert Lev < EL ******* @ hotmail.com>在消息中写道

news:94 ************************** @ posting.google.c om ...
Any reason not to put the words into a dictionary?

Then your code becomes:

lstdict=dict([(x.lower().strip(), None) for x in
file("D:\\CommonDictionary.txt")])
if lstdict.has_key(word):
do something

(not tested, but should be close)

I''ll bet access is faster (even if loading is slightly slower).
You also benefit from the fact that the dictionary file
doesn''t need to be kept sorted.

HTH,
Larry Bates
Syscon, Inc.

"Elbert Lev" <el*******@hotmail.com> wrote in message
news:94**************************@posting.google.c om...
大家好!

这是问题所在:
我有一个文件,其中包含一个常用字典 - 每行一个字
(appr 700KB和70000字)。我必须在记忆中阅读它以便将来
拼写检查来自客户的话。该文件已经预先分类。所以这里是:

lstdict = map(lambda x:x.lower()。strip(),
文件(" D:\\CommonDictionary.txt"))

像魅力一样。它需要我的机器0.7秒才能完成这个技巧
和python.exe(来自任务管理器数据)正在使用此行之前执行
2636K,以及5520K之后。差异是2884K。没那么糟糕,考虑到在C中我会读取内存中的文件(700K)扫描
CR,计算它们,替换为''\0''并分配索引
在计算CR时我发现的大小的单词beginnigs的向量。在这个特定情况下,索引向量几乎是300K。到目前为止一直很好!

然后我意识到,作为一个列表的lstdict是一种矫枉过正。在我的情况下,元组足够了。所以我修改了代码:

t = tuple(文件(&D:\\CommonDictionary.txt"))
lstdict = map(lambda x:x.lower() .strip(),t)

这段代码工作得快一点:0.5秒,但需要5550K内存。
也许这是可以理解的:毕竟第一行创建了一个
列表和一个元组和第二个另一个元组(都是相同大小)。

然后我决定比较这些部分:

t =元组(文件( D:\\CommonDictionary.txt))#1
lstdict = map(lambda x:x.lower()。strip(),t)#2

lstdict = map(lambda x:x.lower()。strip(),
file(" D:\\CommonDictionary.txt"))#3

正如所料,之后第2行内存是5550K,但在第3行后它跳到了7996K !!!

如果使用了引用计数,为什么第二次分配lstdict
(第3行)没有释放第一个(第2行)所记录的内存和/或重用它?

再做一个实验:

t =元组(文件(D:\\CommonDictionary.txt))#1
lstdict = map(lambda x:x.lower()。strip(),t)#2
del lstdict#3
lstdict = map(lambda x:x.lower()。strip(),
文件(D:\\CommonDictionary.txt))#4

在这种情况下,执行第4行没有添加内存!

顺便说一下,按功能完成搜索速度非常好:

def inDict(lstdict,word):
尝试:
返回lstdict [bisect(lstdict,word) ) - 1] == word
除了:
返回False

500字在不到0.03秒内测试。
Hi, all!

Here is the problem:
I have a file, which contains a common dictionary - one word per line
(appr. 700KB and 70000 words). I have to read it in memory for future
"spell checking" of the words comming from the customer. The file is
presorted. So here it goes:

lstdict = map(lambda x: x.lower().strip(),
file("D:\\CommonDictionary.txt"))

Works like a charm. It takes on my machine 0.7 seconds to do the trick
and python.exe (from task manager data) was using before this line is
executed
2636K, and after 5520K. The difference is 2884K. Not that bad, taking
into account that in C I''d read the file in memory (700K) scan for
CRs, count them, replace with ''\0'' and allocate the index vector of
the word beginnigs of the size I found while counting CRs. In this
particular case index vector would be almost 300K. So far so good!

Then I realized, that lstdict as a list is an overkill. Tuple is
enough in my case. So I modified the code:

t = tuple(file("D:\\CommonDictionary.txt"))
lstdict = map(lambda x: x.lower().strip(), t)

This code works a little bit faster: 0.5 sec, but takes 5550K memory.
And maybe this is understandable: after all the first line creates a
list and a tuple and the second another tuple (all of the same size).

But then I decieded to compare this pieces:

t = tuple(file("D:\\CommonDictionary.txt")) # 1
lstdict = map(lambda x: x.lower().strip(), t) # 2

lstdict = map(lambda x: x.lower().strip(),
file("D:\\CommonDictionary.txt")) # 3

As expected, after line 2 memory was 5550K, but after line 3 it jumped
to 7996K!!!

The question:

If refference counting is used, why the second assignment to lstdict
(line 3) did not free the memory alocated by the first one (line 2)
and reuse it?

So one more experiment:

t = tuple(file("D:\\CommonDictionary.txt")) # 1
lstdict = map(lambda x: x.lower().strip(), t) # 2
del lstdict # 3
lstdict = map(lambda x: x.lower().strip(),
file("D:\\CommonDictionary.txt")) # 4

In this case executing line 4 did not add memory!

By the way, the search speed is very-very good when done by function:

def inDict(lstdict, word):
try:
return lstdict[bisect(lstdict, word) - 1] == word
except:
return False

500 words are tested in less then 0.03 seconds.


< br>

" Larry Bates" <磅**** @ swamisoft.com>写道:
"Larry Bates" <lb****@swamisoft.com> wrote:
有什么理由不将这些单词写入字典?

我打赌访问速度更快(即使加载速度稍慢) 。
Any reason not to put the words into a dictionary?
[...]
I''ll bet access is faster (even if loading is slightly slower).




当我有一个很长的启动时间时,我已经使用了一个技巧

,因为它正在解析静态文件和构建数据结构是为了b $ b构建数据结构一次,然后将其腌制(参见pickle和

cPickle模块)。


你的生产代码只需要调用cPickle.load(文件),你就可以比重新解析原始数据文件更快地运行起来。



One trick I''ve used when I have something with a long startup time
because it''s parsing static files and building data structures is to
build the data structure once, then pickle it (see the pickle and
cPickle modules).

You production code then just has to call cPickle.load (file) and you''re
up and running way faster than re-parsing the original data files.


这篇关于列表,元组和记忆。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆