从一个巨大的文件中读取几行作为Numpy数组 [英] Reading a few lines as Numpy array from a huge file

查看：77 发布时间：2020/5/18 23:29:07 python-2.7 file numpy

本文介绍了从一个巨大的文件中读取几行作为Numpy数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个文本文件，其中包含十亿个单词及其对应的300维单词向量.我需要提取数千个单词&他们的单词矢量来自文件&将它们存储为Numpy数组.文本文件的大小约为1 GB.
天真的，我尝试使用genfromtxt将整个文件加载到数组中，但是那没有用.然后我尝试逐行读取整个文件(文件中的每一行都包含一个单词及其单词向量)，寻找word并提取单词向量，但我想这需要每个单词一次遍历文件，而且由于我需要数千个单词，因此需要遍历整个文件数千次.
最快，最有效的方法是什么?

I have a text file that contains a billion words and their corresponding 300 dimensional word vectors.I need to extract a few thousands of words & their words vectors from the file & store them as Numpy arrays.Size of the text file is around 1 GB.
Naively, I tried to load the whole file in an array using genfromtxt but that did not worked.Then I tried to read the whole file line by line(each line in the file consists of a word and its word vector), looking for the word and extracting the word vector but I guess, that requires one pass over the file per word and as I need thousands of words, it will need to iterate over the whole file thousands of time.
What would be the fastest and most efficient way to do this?

推荐答案

我不知道这是最快的(可能不是)，但是效果很好(我在> 100,000行的文件中对其进行了测试):

I don't know this is the fastest (probably not) but it works reasonably well (I tested it on a >100,000 lines file):

F = filter(lambda s: s.strip().split()[0] in word_set if s.strip() else False,
           open(fn, 'rt'))
x = np.genfromtxt(F, *yourargs, **yourkwds)

这是针对Python2的.在Python3中，似乎必须.encode()输入.

This is for Python2. In Python3 it seems one has to .encode() the input.

这篇关于从一个巨大的文件中读取几行作为Numpy数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从一个巨大的文件中读取几行作为Numpy数组 [英] Reading a few lines as Numpy array from a huge file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从一个巨大的文件中读取几行作为Numpy数组 [英] Reading a few lines as Numpy array from a huge file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭