从csv加载一定数量的行与numpy [英] load a certain number of rows from csv with numpy

查看：151 发布时间：2017/2/24 18:56:11 python csv numpy genfromtxt

本文介绍了从csv加载一定数量的行与numpy的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个很长的文件，我只需要零件，一个切片。
有新的数据进来，所以文件可能会更长。

I have a very long file and I only need parts, a slice, of it. There is new data coming in so the file will potentially get longer.

要从CSV加载数据，我使用 numpy。 genfromtxt


To load the data from the CSV I use numpy.genfromtxt
    np.genfromtxt(filename, usecols={col}, delimiter=",", skip_header=skip_head)

这会切断文件开头的某些部分，加载数据的过程。 
但是我不能使用 skip_footer 到最后切掉我想要使用的切片后的部分。 
This cuts off a certain parts of the file in the beginning which already substantially speeds up the process of loading the data.
But I can't use skip_footer in the end to cut off the part after my slice that I want to use. 
我想要的是只加载一定数量的行。例如让我说，我跳过前100行，然后加载后50行，然后跳过其余的。
What I want is to only load a certain number of rows. e.g. lets say I skip the first 100 rows, then load the next 50 rows and skip the rest afterwards.
编辑：我使用Python 3.4 
 
编辑：示例文件： http://www.file-upload.net/ download-10819938 / sample.txt.html  
edit: I am using Python 3.4

edit: sample file: http://www.file-upload.net/download-10819938/sample.txt.html
推荐答案
您可以使用itertools获取切片， itemgetter：
You could get the slice using itertools, taking the column using itemgetter:
import  numpy as np
from operator import itemgetter
import csv
with open(filename) as f:
   from itertools import islice,imap
   r = csv.reader(f)
   np.genfromtxt(imap(itemgetter(1),islice(r,  start, end+1)))

 
 $ b $ p 
对于python3，可以使用 fromiter 用上面的代码你需要指定dtype：
For python3, you can use fromiter with the code above you need to specify the dtype:
import numpy as np
from operator import itemgetter
import csv
with open("sample.txt") as f:
   from itertools import islice
   r = csv.reader(f)
   print(np.fromiter(map(itemgetter(0), islice(r,  start, end+1)), dtype=float))

 
 $ b b 
在另一个答案，你也可以直接传递islice对象到genfromtxt，但对于python3你将需要以二进制模式打开文件：
As in the other answer you can also pass the islice object directly to genfromtxt but for python3 you will need to open the file in binary mode:
with open("sample.txt", "rb") as f:
    from itertools import islice
    print(np.genfromtxt(islice(f, start, end+1), delimiter=",", usecols=cols))

有趣的是，对于使用itertools的多个列。如果所有的类型都是相同的，链和重塑的效率是两倍以上：
Interestingly, for multiple columns using itertools.chain and reshaping is over twice as efficient if all your dtypes are the same:
from itertools import islice,chain
with open("sample.txt") as f:
   r = csv.reader(f)
   arr =np.fromiter(chain.from_iterable(map(itemgetter(0, 4, 10), 
                                            islice(r,  4, 10))), dtype=float).reshape(6, -1) 

 
 $ b b 
在样例文件中：
On you sample file:
In [27]: %%timeit
with open("sample.txt", "rb") as f:
    (np.genfromtxt(islice(f, 4, 10), delimiter=",", usecols=(0, 4, 10),dtype=float))
   ....: 

10000 loops, best of 3: 179 µs per loop

In [28]: %%timeit
with open("sample.txt") as f:
   r = csv.reader(f)                                                               (np.fromiter(chain.from_iterable(map(itemgetter(0, 4, 10), islice(r,  4, 10))), dtype=float).reshape(6, -1))

10000 loops, best of 3: 86 µs per loop


                        这篇关于从csv加载一定数量的行与numpy的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

从csv加载一定数量的行与numpy [英] load a certain number of rows from csv with numpy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从csv加载一定数量的行与numpy [英] load a certain number of rows from csv with numpy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭