从单个numpy数组中的多个文件获取数据的Python快速方法 [英] Python fast way to get data from multiple files in single numpy array

查看:92
本文介绍了从单个numpy数组中的多个文件获取数据的Python快速方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要读入存储在许多相同格式但长度不同的文件中的数据,即相同的列但行数不同。此外,我需要将数据的每一列存储在一个数组中(最好是一个numpy数组,但也可以使用列表)。

I need to read in data which is stored in many files of the same format, but varying length, i.e. identical columns, but varying number of rows. Furthermore, I need each column of the data to be stored in one array (preferrably one numpy array, but a list is also acceptable).

我现在读了每个文件使用 numpy.loadtxt()循环,然后将结果数组连接起来。假设数据由3列组成,并存储在两个文件 foo和 bar中:

For now, I read in every file in a loop with numpy.loadtxt() and then concatenate the resulting arrays. Say the data consists of 3 columns and is stored in the two files "foo" and "bar":

import numpy as np
filenames = ["foo", "bar"]
col1_all = 0  #data will be stored in these 3 arrays
col2_all = 0
col3_all = 0
for f in filename:
    col1, col2, col3 = np.loadtxt(f, unpack=True)
    if col1.shape[0] > 0: # I can't guarantee file won't be empty
        if type(col1_all) == int:
            # if there is no data read in yet, just copy arrays
            col1_all = col1[:]
            col2_all = col2[:]
            col3_all = col3[:]
        else:
            col1_all = np.concatenate((col1_all, col1))
            col2_all = np.concatenate((col2_all, col2))
            col3_all = np.concatenate((col3_all, col3))

我的问题是:是否有更好/更快的方法?我需要尽快读取该文件,因为我需要读取数百个文件。

My question is: Is there a better/faster way to do this? I need this to be as quick as possible, as I need to read in hundreds of files.

例如,我可以想象,首先找出总共有多少行,然后分配足够大的数组以首先适合所有数据,然后复制该数组中的读入数据可能会更好,因为我避开了连接。我不知道总行数,所以这也必须在python中完成。

I could imagine, for example, that first finding out how many rows in total I will have and "allocating" an array of big enough size to fit all the data first, then copying the read-in data in that array might perform better, as I circumvent the concatenations. I don't know the total number of rows, so this will have to be done in python too.

另一个想法是首先读取所有数据,存储每个分别读入,最后将它们连接起来。 (或者,因为这必不可少,所以给了我总行数,分配一个适合所有数据的行,然后在其中复制数据。)

Another idea would be first read in all the data, store each read-in separately, and concatenate them in the end. (Or, as this essentialy gives me the total number of rows, allocate a row that fits all the data, and then copy the data in there).

有人吗?体验哪种方法最有效?

Does anyone have experience on what works best?

推荐答案

不要将每个文件与其余文件连接在一起,读取列表中的所有内容,并构建最终结果

Don't concatenate each file on with the rest, read everything in lists, and built the results in the end

import numpy as np
filenames = ["foo", "bar"]
data = np.concatenate([np.loadtxt(f) for f in filenames])

如果您可以将数据分成几列,但是大多数情况下,这不是一个好主意。

If you like, you can split data into columns, but mostly, this is not a good idea.

这篇关于从单个numpy数组中的多个文件获取数据的Python快速方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆