在 Python 中搜索二维数组 [英] Search in two dimensional array in Python

查看:55
本文介绍了在 Python 中搜索二维数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够通过 Python 在给定两个或更多参数的情况下检索大型数据集(900 万行,1.4 GB)中的特定行.

I'd like to be able to retrieve specifics rows in a large dataset (9M lines, 1.4 GB) given two or more parameters through Python.

例如,来自这个数据集:

For example, from this dataset :

ID1 2   10  2   2   1   2   2   2   2   2   1

ID2 10  12  2   2   2   2   2   2   2   1   2

ID3 2   22  0   1   0   0   0   0   0   1   2

ID4 14  45  0   0   0   0   1   0   0   1   1

ID5 2   8   1   1   1   1   1   1   1   1   2

给定示例参数:

  • 第二列必须等于 2,并且
  • 第三列必须在范围从 4 到 15

我应该得到:

ID1 2   10  2   2   1   2   2   2   2   2   1

ID5 2   8   1   1   1   1   1   1   1   1   2

问题是我不知道如何在 Python 中的二维数组上有效地执行这些操作.

The problem is that i don't know how to do these operations efficiently on a two dimensional array in Python.

这是我试过的:

line_list = []

# Loading of the whole file in memory
for line in file:
    line_list.append(line)

# set conditions
i = 2
start_range = 4
end_range = 15

# Iteration through the loaded list and split for each column
for index in data_list:
    data = index.strip().split()
    # now test if the current line matches with conditions
    if(data[1] == i and data[2] >= start_range and data[2] <= end_range):
        print str(data)

我想多次执行这个过程,但我这样做的方式真的很慢,即使数据文件加载到内存中也是如此.

I'd like to perform this process a lot of times an the way i'm doing it is really slow, even with the data file loaded in memory.

我正在考虑使用 numpy 数组,但我不知道如何在给定条件下检索行.

I was thinking about using numpy arrays but i don't know how to retrieve a row given conditions.

感谢您的帮助!

按照建议,我使用了关系数据库系统.我选择了 Sqlite3,因为它非常易于使用且易于部署.

As suggested, i used a relational database system. I chose Sqlite3 as it is pretty easy to use and quick to deploy.

我的文件在大约 4 分钟内通过 sqlite3 中的导入函数加载.

My file was loaded through an import function in sqlite3 in roughly 4 minutes.

我在第二列和第三列上做了索引,以加快检索信息的过程.

I did an index on the second and third column to accelerate the process when retrieving information.

查询是通过 Python 完成的,使用模块sqlite3".

The query was done through Python, with the module "sqlite3".

就是这样,速度更快!

推荐答案

我几乎会选择你所拥有的(未经测试):

I'd go for almost what you've got (un-tested):

with open('somefile') as fin:
    rows = (line.split() for line in fin)
    take = (row for row in rows if int(row[1] == 2) and 4 <= int(row[2]) <= 15)
    # data = list(take)
    for row in take:
        pass # do something

这篇关于在 Python 中搜索二维数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆