有效地将最后n行的CSV读入DataFrame [英] Efficiently Read last 'n' rows of CSV into DataFrame

查看:659
本文介绍了有效地将最后n行的CSV读入DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这样做的几种方法:


  1. 读取整个CSV,然后使用 df.tail

  2. 以某种方式逆转文件(对于大文件,最好的方法是什么?)然后使用 nrows 要读取的参数

  3. 以某种方式查找CSV中的行数,然后使用 skiprows 并读取所需的行数。 >
  4. 也许大块读取丢弃初始块(虽然不知道如何工作)

做一些更容易的方式?



可能相关:


  1. 有效找到文本文件中的最后一行< a>

  2. 读取〜13000行CSV文件的部分,包含pandas read_csv和nrows

不直接相关:




我不认为熊猫提供了一种方法,在 read_csv



最整洁(一次性使用)是使用 集合.deque

 从集合import deque 
从StringIO import StringIO

open(fname,'r')as f:
q = deque(f,2)#将2替换为n(结尾读取的行)
$ b b In [12]:q
Out [12]:deque(['7,8,9\\\
','10,11,12'],maxlen = 2)
#我的csv的最后两行

In [13]:pd.read_csv(StringIO(''。join(q)),header = None)


另一个值得尝试的选择是获取第一遍中的行数,然后再次读取该文件,使用 read_csv ...


A few methods to do this:

  1. Read the entire CSV and then use df.tail
  2. Somehow reverse the file (whats the best way to do this for large files?) and then use nrows argument to read
  3. Somehow find the number of rows in the CSV, then use skiprows and read required number of rows.
  4. Maybe do chunk read discarding initial chunks (though not sure how this would work)

Can it be done in some easier way? If not, which amongst these three should be prefered and why?

Possibly related:

  1. Efficiently finding the last line in a text file
  2. Reading parts of ~13000 row CSV file with pandas read_csv and nrows

Not directly related:

  1. How to get the last n row of pandas dataframe?

解决方案

I don't think pandas offers a way to do this in read_csv.

Perhaps the neatest (in one pass) is to use collections.deque:

from collections import deque
from StringIO import StringIO

with open(fname, 'r') as f:
    q = deque(f, 2)  # replace 2 with n (lines read at the end)

In [12]: q
Out[12]: deque(['7,8,9\n', '10,11,12'], maxlen=2)
         # these are the last two lines of my csv

In [13]: pd.read_csv(StringIO(''.join(q)), header=None)

Another option worth trying is to get the number of lines in a first pass and then read the file again, skip that number of rows (minus n) using read_csv...

这篇关于有效地将最后n行的CSV读入DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆