如何从流读取CSV文件并在写入时处理每行? [英] How to read a CSV file from a stream and process each line as it is written?

查看:238
本文介绍了如何从流读取CSV文件并在写入时处理每行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从标准输入中读取CSV文件,并在每行处理。我的CSV输出代码逐行写入行,但是我的阅读器在迭代行之前等待流终止。这是 csv 模块的限制吗?我做错了什么?

I would like to read a CSV file from the standard input and process each row as it comes. My CSV outputting code writes rows one by one, but my reader waits the stream to be terminated before iterating the rows. Is this a limitation of csv module? Am I doing something wrong?

我的读者代码:

import csv
import sys
import time


reader = csv.reader(sys.stdin)
for row in reader:
    print "Read: (%s) %r" % (time.time(), row)

代码:

import csv
import sys
import time


writer = csv.writer(sys.stdout)
for i in range(8):
    writer.writerow(["R%d" % i, "$" * (i+1)])
    sys.stdout.flush()
    time.sleep(0.5)

输出 python test_writer.py | python test_reader.py

Read: (1309597426.3) ['R0', '$']
Read: (1309597426.3) ['R1', '$$']
Read: (1309597426.3) ['R2', '$$$']
Read: (1309597426.3) ['R3', '$$$$']
Read: (1309597426.3) ['R4', '$$$$$']
Read: (1309597426.3) ['R5', '$$$$$$']
Read: (1309597426.3) ['R6', '$$$$$$$']
Read: (1309597426.3) ['R7', '$$$$$$$$']

正如你可以看到所有打印语句都是同时执行,有一个500毫秒的差距。

As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.

推荐答案

因为它在文档中说明


顺序使 for 循环以最有效的方式循环遍历文件行(一种非常常见的操作), next()方法使用隐藏的预读缓冲区。

In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer.

您可以通过查看 csv 模块的实现(行784), csv.reader 调用underlyling迭代器的 next()方法(通过 PyIter_Next )。

And you can see by looking at the implementation of the csv module (line 784) that csv.reader calls the next() method of the underlyling iterator (via PyIter_Next).

所以如果你真的想要没有缓冲地读取CSV文件,你需要转换文件对象c> c()方法实际调用 readline() c>。这可以使用 的双参数形式轻松完成, iter 函数。因此,将 test_reader.py 中的代码更改为这样:

So if you really want unbuffered reading of CSV files, you need to convert the file object (here sys.stdin) into an iterator whose next() method actually calls readline() instead. This can easily be done using the two-argument form of the iter function. So change the code in test_reader.py to something like this:

for row in csv.reader(iter(sys.stdin.readline, '')):
    print("Read: ({}) {!r}".format(time.time(), row))

例如,

$ python test_writer.py | python test_reader.py
Read: (1388776652.964925) ['R0', '$']
Read: (1388776653.466134) ['R1', '$$']
Read: (1388776653.967327) ['R2', '$$$']
Read: (1388776654.468532) ['R3', '$$$$']
[etc]

您能解释为什么需要无缓冲地读取CSV文件吗?可能有更好的解决方案,无论你正在尝试做什么。

Can you explain why you need unbuffered reading of CSV files? There might be a better solution to whatever it is you are trying to do.

这篇关于如何从流读取CSV文件并在写入时处理每行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆