从单个numpy数组中的多个文件获取数据的Python快速方法 [英] Python fast way to get data from multiple files in single numpy array

查看：92 发布时间：2020/9/25 0:55:21 python arrays python-2.7 numpy

本文介绍了从单个numpy数组中的多个文件获取数据的Python快速方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要读入存储在许多相同格式但长度不同的文件中的数据，即相同的列但行数不同。此外，我需要将数据的每一列存储在一个数组中（最好是一个numpy数组，但也可以使用列表）。

I need to read in data which is stored in many files of the same format, but varying length, i.e. identical columns, but varying number of rows. Furthermore, I need each column of the data to be stored in one array (preferrably one numpy array, but a list is also acceptable).

我现在读了每个文件使用 numpy.loadtxt（）循环，然后将结果数组连接起来。假设数据由3列组成，并存储在两个文件 foo和 bar中：

For now, I read in every file in a loop with numpy.loadtxt() and then concatenate the resulting arrays. Say the data consists of 3 columns and is stored in the two files "foo" and "bar":

import numpy as np
filenames = ["foo", "bar"]
col1_all = 0  #data will be stored in these 3 arrays
col2_all = 0
col3_all = 0
for f in filename:
    col1, col2, col3 = np.loadtxt(f, unpack=True)
    if col1.shape[0] > 0: # I can't guarantee file won't be empty
        if type(col1_all) == int:
            # if there is no data read in yet, just copy arrays
            col1_all = col1[:]
            col2_all = col2[:]
            col3_all = col3[:]
        else:
            col1_all = np.concatenate((col1_all, col1))
            col2_all = np.concatenate((col2_all, col2))
            col3_all = np.concatenate((col3_all, col3))

我的问题是：是否有更好/更快的方法？我需要尽快读取该文件，因为我需要读取数百个文件。

My question is: Is there a better/faster way to do this? I need this to be as quick as possible, as I need to read in hundreds of files.

例如，我可以想象，首先找出总共有多少行，然后分配足够大的数组以首先适合所有数据，然后复制该数组中的读入数据可能会更好，因为我避开了连接。我不知道总行数，所以这也必须在python中完成。

I could imagine, for example, that first finding out how many rows in total I will have and "allocating" an array of big enough size to fit all the data first, then copying the read-in data in that array might perform better, as I circumvent the concatenations. I don't know the total number of rows, so this will have to be done in python too.

另一个想法是首先读取所有数据，存储每个分别读入，最后将它们连接起来。（或者，因为这必不可少，所以给了我总行数，分配一个适合所有数据的行，然后在其中复制数据。）

Another idea would be first read in all the data, store each read-in separately, and concatenate them in the end. (Or, as this essentialy gives me the total number of rows, allocate a row that fits all the data, and then copy the data in there).

有人吗？体验哪种方法最有效？

Does anyone have experience on what works best?

从单个numpy数组中的多个文件获取数据的Python快速方法 [英] Python fast way to get data from multiple files in single numpy array

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从单个numpy数组中的多个文件获取数据的Python快速方法 [英] Python fast way to get data from multiple files in single numpy array

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭