从Python中的csv文件的特定行迭代 [英] Iterate from a certain row of a csv file in Python

查看：307 发布时间：2017/2/25 0:36:15 python python-3.x csv

本文介绍了从Python中的csv文件的特定行迭代的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个csv文件有数百万行。我想从10,000,000行开始迭代。目前我有代码：

I have a csv file with many millions of rows. I want to start iterating from the 10,000,000 row. At the moment I have the code:

    with open(csv_file, encoding='UTF-8') as f: 
        r = csv.reader(f)
        for row_number, row in enumerate(r):    
            if row_number < 10000000:
                continue
            else:
                process_row(row)

这工作，但需要几秒钟，在感兴趣的行出现之前运行。可能所有不需要的行被不必要地加载到python，减慢它。是否有一种方法可以在某一行启动迭代过程，即没有读取数据的开始。

This works, however take several seconds to run before the rows of interest appear. Presumably all the unrequired rows are loaded into python unnecessarily, slowing it down. Is there a way of starting the iteration process on a certain row - i.e. without the start of the data read in.

推荐答案

可以使用 islice ：

from itertools import islice

with open(csv_file, encoding='UTF-8') as f:
    r = csv.reader(f)
    for row in islice(r,  10000000, None):
            process_row(row)

它仍然遍历所有行，但它更有效率。

It still iterates over all the rows but does it a lot more efficiently.

您也可以使用消费配方，它调用以C速度消耗迭代器的函数，在文件对象之前将其传递给 csv.reader ，所以你也避免了不必要的处理这些行与读者：

You could also use the consume recipe which calls functions that consume iterators at C speed, calling it on the file object before you pass it to the csv.reader, so you also avoid needlessly processing those lines with the reader:

import collections
from itertools import islice
def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)


with open(csv_file, encoding='UTF-8') as f:
    consume(f, 9999999)
    r = csv.reader(f)
    for row  in r:
          process_row(row)

如果一个文件可以嵌入换行符，那么你必须使用读者并传递 newline = 但是如果不是这样，那么使用do consume文件对象，因为性能差异将是相当大的，特别是如果你有很多列。

As Shadowranger commented, if a file could conatin embedded newlines then you would have to consume the reader and pass newline="" but if that is not the case then use do consume the file object as the performance difference will be considerable especially if you have a lot of columns.

这篇关于从Python中的csv文件的特定行迭代的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从Python中的csv文件的特定行迭代 [英] Iterate from a certain row of a csv file in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从Python中的csv文件的特定行迭代 [英] Iterate from a certain row of a csv file in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭