在范围的字典中找到一个值 - python [英] Finding a value within a dictionary of ranges - python

查看:463
本文介绍了在范围的字典中找到一个值 - python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在比较2个文件与一个初始标识符列,起始值和结束值。第二个文件包含对应的标识符和另一个值列。





文件1:

  A 200 900 
A 1000 1200
B 100 700
B 900 1000

文件2: p>

  A 103 
A 200
A 250
B 50
B 100
B 150

我想从第二个文件中找到包含在范围内的所有值在第一个文件中找到,以便我的输出如下所示:

  A 200 
A 250
B 100
B 150

现在我已经从第一个文件创建了一个列表,的范围:
Ex。

 如果字典中的标识符:
字典[标识符] .extend(范围(开始,(结束+ 1)))
else:
字典[标识符] =范围(开始,(结束+ 1))

然后我浏览第二个文件并搜索范围字典中的值:
E x。

 如果字典中的标识符:
如果字典中的值[标识符]:
OutFile.write (Line +\\\

虽然不是最佳的,但对于较小的文件,有几个大文件,这个程序证明是非常低效的。我需要优化我的程序,以便它运行得更快。

解决方案

  import defaultdict 
ident_ranges = defaultdict(list)
with open('file1.txt','r')as f1
for the row in f1:
ident,start,end = row.split()
start,end = int(start),int(end)
ident_ranges [ident] .append((start,end))
with open('file2.txt ','r')为f2,打开('out.txt','w')作为输出:
为f2中的行
ident,value = line.split()
value = int(value)
如果有的话(start< = value< = end for start,end in ident_ranges [ident]):
output.write(line)

注意:使用 defaultdict 将范围添加到您的字典中,而无需首先检查键的存在。使用任何允许范围检查短路。使用链接的比较是一个很好的Python语法快捷方式( start< = value< = end )。


I'm comparing 2 files with an initial identifier column, start value, and end value. The second file contains corresponding identifiers and another value column.

Ex.

File 1:

A     200     900
A     1000    1200
B     100     700
B     900     1000

File 2:

A     103
A     200
A     250
B     50
B     100
B     150

I would like to find all values from the second file that are contained within the ranges found in the first file so that my output would look like:

A     200
A     250
B     100
B     150

For now I have created a dictionary from the first file with a list of ranges: Ex.

if Identifier in Dictionary:
    Dictionary[Identifier].extend(range(Start, (End+1)))
else:
    Dictionary[Identifier] = range(Start, (End+1))

I then go through the second file and search for the value within the dictionary of ranges: Ex.

if Identifier in Dictionary:
    if Value in Dictionary[Identifier]:
    OutFile.write(Line + "\n")

While not optimal this works for relatively small files, however I have several large files and this program is proving terribly inefficient. I need to optimize my program so that it will run much faster.

解决方案

from collections import defaultdict
ident_ranges = defaultdict(list)
with open('file1.txt', 'r') as f1
    for row in f1:
        ident, start, end = row.split()
        start, end = int(start), int(end)
        ident_ranges[ident].append((start, end))
with open('file2.txt', 'r') as f2, open('out.txt', 'w') as output:  
    for line in f2:
        ident, value = line.split()
        value = int(value)
        if any(start <= value <= end for start, end in ident_ranges[ident]):
            output.write(line)

Notes: Using a defaultdict allows you to add ranges to your dictionary without first checking for the existence of a key. Using any allows for short circuiting of the range check. Using chained comparision is a nice Python syntactic shortcut (start <= value <= end).

这篇关于在范围的字典中找到一个值 - python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆