有效地从各种文件中收集信息 [英] gather information from various files efficiently

查看:70
本文介绍了有效地从各种文件中收集信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,


我需要收集各种文件中包含的信息。


喜欢这样:


file1:

=====================

foo:1 2

bar:2 4

baz:3

==================== =


file2:

=====================

foo:5

bar:6

baz:7

=============== ======


file3:

===================== < br $>
foo:4 18

bar:8

==================== =

解决这个问题的直接方法是创建一个

字典。像这样:

[...]


a,b = get_information(line)

如果在dict.keys中( ):

dict [a] .append(b)

else:

dict [a] = [b]

然而,我有43个这样的文件。它们共计4,1M

大。在未来,它们可能会变得更大。

目前,这个过程需要几个小时。由于这是一个我必须经常运行的过程,我希望它能更快。


如何才能更有效地解决问题?

Klaus

Hello,

I need to gather information that is contained in various files.

Like so:

file1:
=====================
foo : 1 2
bar : 2 4
baz : 3
=====================

file2:
=====================
foo : 5
bar : 6
baz : 7
=====================

file3:
=====================
foo : 4 18
bar : 8
=====================
The straightforward way to solve this problem is to create a
dictionary. Like so:
[...]

a, b = get_information(line)
if a in dict.keys():
dict[a].append(b)
else:
dict[a] = [b]
Yet, I have got 43 such files. Together they are 4,1M
large. In the future, they will probably become much larger.
At the moment, the process takes several hours. As it is a process
that I have to run very often, I would like it to be faster.

How could the problem be solved more efficiently?
Klaus

推荐答案

Klaus Neuner写道:
Klaus Neuner wrote:
你好,

如下:

file1:
=========== ==========
foo:1 2
bar:2 4
baz:3
============= ========

文件2:
=====================
foo:5
吧:6
baz:7
=====================

文件3:< br => =====================
foo:4 18
bar:8
======= ==============

解决这个问题的直接方法是创建一个
字典。像这样:

[...]

a,b = get_information(line)
如果在dict.keys():
dict [a] .append(b)
否则:
dict [a] = [b]
Hello,

I need to gather information that is contained in various files.

Like so:

file1:
=====================
foo : 1 2
bar : 2 4
baz : 3
=====================

file2:
=====================
foo : 5
bar : 6
baz : 7
=====================

file3:
=====================
foo : 4 18
bar : 8
=====================
The straightforward way to solve this problem is to create a
dictionary. Like so:
[...]

a, b = get_information(line)
if a in dict.keys():
dict[a].append(b)
else:
dict[a] = [b]




Aye ...


dict.keys()行创建一个临时列表,然后''in''对列表进行

线性搜索。更好的是:


试试:

dict [a] .append(b)
除了KeyError之外的


dict [a] = [b]


因为你希望钥匙在大多数时间都在那里,这种方法最有效率
。您可以通过光学方式获取字典条目,并且在

特殊情况下它还没有存在,您可以添加它。



-

\ / \ /

(OO)

- --------------- ----- OOOO〜(_)〜OOOO -------------------------------------- -

Keith Dart< kd *** @ kdart.com>

公钥:ID:F3D288E4

===== ============================================= ===== =====================



Aye...

the dict.keys() line creates a temporary list, and then the ''in'' does a
linear search of the list. Better would be:

try:
dict[a].append(b)
except KeyError:
dict[a] = [b]

since you expect the key to be there most of the time, this method is
most efficient. You optomistically get the dictionary entry, and on the
exceptional case where it doesn''t yet exist you add it.


--
\/ \/
(O O)
-- --------------------oOOo~(_)~oOOo----------------------------------------
Keith Dart <kd***@kdart.com>
public key: ID: F3D288E4
================================================== ==========================


Keith Dart写道:
Keith Dart wrote:
尝试:
dict [a] .append(b)
除了KeyError:
dict [a] = [b]
try:
dict[a].append(b)
except KeyError:
dict[a] = [b]




或我的最喜欢的Python快捷方式:

dict.setdefault(a,[])。append(b)


Kent



or my favorite Python shortcut:
dict.setdefault(a, []).append(b)

Kent

Keith Dart写道:
Keith Dart wrote:
Aye ...

dict.keys()行创建一个临时列表,然后''在''对列表进行线性搜索。更好的是:

尝试:
dict [a] .append(b)
除了KeyError:
dict [a] = [b]

因为你希望密钥在大多数时间都存在,所以这种方法效率最高。你有资格获得字典条目,并且在
特殊情况下它还没有存在你添加它。
Aye...

the dict.keys() line creates a temporary list, and then the ''in'' does a
linear search of the list. Better would be:

try:
dict[a].append(b)
except KeyError:
dict[a] = [b]

since you expect the key to be there most of the time, this method is
most efficient. You optomistically get the dictionary entry, and on the
exceptional case where it doesn''t yet exist you add it.




我想知道是否


dct.setdefault(a,[])。追加(b)


不会更快。它节省了在
python中设置try / except帧处理的方法(我假设dicts的C实现与

相比得到了类似的结果)。


干杯,


f


ps。我更改了dict-> dct,因为它通常是Bad Idea(TM)将本地

变量命名为内置类型。这是为了OP的好处(我知道你只是按照他的代码约定来获得
)。



I wonder if

dct.setdefault(a,[]).append(b)

wouldn''t be even faster. It saves setting up the try/except frame handling in
python (I assume the C implementation of dicts achieves similar results with
much less overhead).

Cheers,

f

ps. I changed dict->dct because it''s a generally Bad Idea (TM) to name local
variables as builtin types. This, for the benefit of the OP (I know you were
just following his code conventions).


这篇关于有效地从各种文件中收集信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆