查询保存numpy的阵列的numpy的数组作为NPZ是缓慢的 [英] Querying a NumPy array of NumPy arrays saved as an npz is slow

查看：1525 发布时间：2016/6/3 10:09:58 python arrays performance numpy

本文介绍了查询保存numpy的阵列的numpy的数组作为NPZ是缓慢的的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我生成一个文件NPZ如下：

I generate a npz file as follows:

import numpy as np
import os

# Generate npz file
dataset_text_filepath = 'test_np_load.npz'
texts = []
for text_number in range(30000): 
    texts.append(np.random.random_integers(0, 20000, 
                 size = np.random.random_integers(0, 100)))
texts = np.array(texts)
np.savez(dataset_text_filepath, texts=texts)

这给了我这个〜7MiB NPZ文件（基本上只有1变量文本，这是numpy的数组的数组numpy的）：

This gives me this ~7MiB npz file (basically only 1 variable texts, which is a NumPy array of Numpy arrays):

我与加载numpy.load（）：

# Load data
dataset = np.load(dataset_text_filepath)

如果我查询它如下，它需要几分钟的时间：

If I query it as follows, it takes several minutes:

# Querying data: the slow way
for i in range(20):
    print('Run {0}'.format(i))
    random_indices = np.random.randint(0, len(dataset['texts']), size=10)
    dataset['texts'][random_indices]

而如果我查询，如下所示，它需要不到5秒：

while if I query as follows, it takes less than 5 seconds:

# Querying data: the fast way
data_texts = dataset['texts']
for i in range(20):
    print('Run {0}'.format(i))
    random_indices = np.random.randint(0, len(data_texts), size=10)
    data_texts[random_indices]

如何而来的第二种方法是让比第一种快得多？

How comes the second method is so much faster than the first one?

推荐答案

数据['文本'] 读取文件时，它每次使用。 负荷 在 NPZ 只返回一个文件加载器，而不是实际的数据。这是一个懒惰装载，访问时只加载特定的阵列。在负荷文档可能会更清楚，但他们说：


dataset['texts'] reads the file each time it is used.  load of a npz just returns a file loader, not the actual data. It's a 'lazy loader', loading the particular array only when accessed.  The load docs could be clearer, but they say:
- If the file is a ``.npz`` file, the returned value supports the context
  manager protocol in a similar fashion to the open function::

    with load('foo.npz') as data:
        a = data['a']

  The underlying file descriptor is closed when exiting the 'with' block.

和从 savez ：
 When opening the saved ``.npz`` file with `load` a `NpzFile` object is
returned. This is a dictionary-like object which can be queried for
its list of arrays (with the ``.files`` attribute), and for the arrays
themselves.

在帮助（np.lib.npyio.NpzFile）详细信息 

                        这篇关于查询保存numpy的阵列的numpy的数组作为NPZ是缓慢的的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

查询保存numpy的阵列的numpy的数组作为NPZ是缓慢的 [英] Querying a NumPy array of NumPy arrays saved as an npz is slow

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

查询保存numpy的阵列的numpy的数组作为NPZ是缓慢的 [英] Querying a NumPy array of NumPy arrays saved as an npz is slow

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭