numpy.memmap的字符串数组? [英] numpy.memmap for an array of strings?

查看:89
本文介绍了numpy.memmap的字符串数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用 numpy.memmap 可以将基于磁盘的大型字符串映射到内存中?

Is it possible to use numpy.memmap to map a large disk-based array of strings into memory?

我知道可以对浮点数之类的东西进行处理,但是这个问题专门针对字符串.

I know it can be done for floats and suchlike, but this question is specifically about strings.

我对固定长度和可变长度字符串的解决方案都很感兴趣.

I am interested in solutions for both fixed-length and variable-length strings.

该解决方案可以自由规定任何合理的文件格式.

The solution is free to dictate any reasonable file format.

推荐答案

如果所有字符串都具有相同的长度(如术语数组"所建议),则很容易实现:

If all the strings have the same length, as suggested by the term "array", this is easily possible:

a = numpy.memmap("data", dtype="S10")

将是长度为10的字符串的示例.

would be an example for strings of length 10.

编辑:由于字符串的长度显然不同,因此您需要为文件编制索引以允许访问O(1)项目.这需要读取整个文件一次,并将所有字符串的起始索引存储在内存中.不幸的是,如果没有先创建与内存中文件大小相同的数组,我认为没有一种纯粹的NumPy索引方法.不过,提取索引后可以删除该数组.

Edit: Since apparently the strings don't have the same length, you need to index the file to allow for O(1) item access. This requires reading the whole file once and storing the start indices of all strings in memory. Unfortunately, I don't think there is a pure NumPy way of indexing without creating an array the same size as the file in memory first. This array can be dropped after extracting the indices, though.

这篇关于numpy.memmap的字符串数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆