集合上的 Python 迭代顺序 [英] Python iteration order on a set

查看:47
本文介绍了集合上的 Python 迭代顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析两个大文件(Gb 大小顺序),每个文件都包含 keys 和相应的 values.一些 keys 在两个文件之间共享,但对应的 values 不同.对于每个文件,我想将 keys* 和相应的 values 写入一个新文件,其中 keys* 表示两者中都存在的键文件 1 和文件 2.我不在乎输出中的 key 顺序,但两个文件中的顺序绝对应该相同.

I am parsing two big files (Gb size order), that each contains keys and corresponding values. Some keys are shared between the two files, but with differing corresponding values. For each of the files, I want to write to a new file the keys* and corresponding values, with keys* representing keys present both in file1 and file2. I don't care on the key order in the output, but the should absolutely be in the same order in the two files.

文件 1:

key1
value1-1
key2
value1-2
key3
value1-3

文件2:

key1
value2-1
key5
value2-5
key2
value2-2

一个有效的输出是:

解析文件 1:

key1
value1-1
key2
value1-2

解析文件 2:

key1
value2-1
key2
value2-2

另一个有效的输出:

解析文件 1:

key2
value1-2
key1
value1-1

解析文件 2:

key2
value2-2
key1
value2-1

无效输出(文件 1 和文件 2 中的键顺序不同):

An invalid output (keys in differing order in file 1 and file 2):

解析文件 1:

key2
value1-2
key1
value1-1

解析文件 2:

key1
value2-1
key2
value2-2

最后一个精度是值大小远远大于键大小.

A last precision is that value sizes are by far bigger than key sizes.

我想做的是:

  • 对于每个输入文件,解析并返回一个dict(我们称之为file_index),其中key对应于文件中的key,value对应于在输入文件中找到密钥的偏移量.

  • For each input file, parse and return a dict (let's call it file_index) with keys corresponding to the keys in the file, and values corresponding to the offset where the key was found in the input file.

计算交集

good_keys = file1_index.viewkeys() & file2_index.viewkeys()

  • 做一些类似(伪代码):

  • do something like (pseudo-code) :

    for each file:
        for good_key in good_keys:
            offset = file_index[good_key]
            go to offset in input_file
            get corresponding value
            write (key, value) to output file
    

  • 迭代同一个集合是否保证我有完全相同的顺序(假设它相同的集合:我不会在两次迭代之间修改它),或者我应该转换先设置一个列表,然后遍历列表?

    Does iterating over the same set guarantee me to have the exact same order (providing that it is the same set: I won't modify it between the two iterations), or should I convert the set to a list first, and iterate over the list?

    推荐答案

    Python 的 dicts 和 set 是稳定的,也就是说,如果你迭代它们而不改变它们,它们保证给你相同的顺序.这来自 dicts 文档:

    Python's dicts and sets are stable, that is, if you iterate over them without changing them they are guaranteed to give you the same order. This is from the documentation on dicts:

    键和值以非随机的任意顺序迭代,随 Python 实现而变化,并且取决于字典的插入和删除历史.如果键、值和项目视图被迭代而没有对字典进行干预修改,项目的顺序将直接对应.

    Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions. If keys, values and items views are iterated over with no intervening modifications to the dictionary, the order of items will directly correspond.

    这篇关于集合上的 Python 迭代顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆