Python生成器读取大型CSV文件 [英] Python generator to read large CSV file

查看:394
本文介绍了Python生成器读取大型CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要编写一个Python生成器,该生成器从两个不同的CSV文件生成元组(X,Y).

I need to write a Python generator that yields tuples (X, Y) coming from two different CSV files.

它应该在init上获得批处理大小,从两个CSV逐行读取,为每行生成一个元组(X,Y),其中X和Y为数组(CSV文件的列).

It should receive a batch size on init, read line after line from the two CSVs, yield a tuple (X, Y) for each line, where X and Y are arrays (the columns of the CSV files).

我看过一些懒惰阅读的例子,但发现很难将它们转换为CSV:

I've looked at examples of lazy reading but I'm finding it difficult to convert them for CSVs:

  • Lazy Method for Reading Big File in Python?
  • Read large text files in Python, line by line without loading it in to memory

不幸的是,在这种情况下,Pandas Dataframes不是一个选择.

Also, unfortunately Pandas Dataframes are not an option in this case.

我可以从任何片段开始?

Any snippet I can start from?

谢谢

推荐答案

您可以拥有一个生成器,该生成器从两个不同的csv阅读器读取行并将其行作为数组对产生.的代码是:

You can have a generator, that reads lines from two different csv readers and yield their lines as pairs of arrays. The code for that is:

import csv
import numpy as np

def getData(filename1, filename2):
    with open(filename1, "rb") as csv1, open(filename2, "rb") as csv2:
        reader1 = csv.reader(csv1)
        reader2 = csv.reader(csv2)
        for row1, row2 in zip(reader1, reader2):
            yield (np.array(row1, dtype=np.float),
                   np.array(row2, dtype=np.float)) 
                # This will give arrays of floats, for other types change dtype

for tup in getData("file1", "file2"):
    print(tup)

这篇关于Python生成器读取大型CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆