将unicode元素读入numpy数组 [英] Reading unicode elements into numpy array

查看：89 发布时间：2020/5/18 18:58:37 python unicode numpy

本文介绍了将unicode元素读入numpy数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

考虑一个名为"new.txt"的文本文件，其中包含以下元素:

Consider a text file called "new.txt" containing the following elements:

μm
∂r
∆λ

在Python 2.7中，我可以通过键入以下内容来读取文件:

In Python 2.7, I can read the file by typing:

>>> import codecs
>>> f = codecs.open('new.txt', encoding='utf-8')
>>> lines = [line.strip() for line in f2.readlines()]
>>> lines
[u'\u03bcm', u'\u2202r', u'\u2206\u03bb']
>>> print lines[0]
μm

到目前为止，一切都很好.我可以通过以下方式轻松地将此列表转换为numpy数组:

So far so good. I can easily convert this list to a numpy array via:

>>> import numpy as np
>>> arr = np.array(lines)
>>> arr
array([u'\u03bcm', u'\u2202r', u'\u2206\u03bb'], 
      dtype='<U2')

问题是，我无法通过numpy的loadtxt函数直接读取此文件:

The issue is, I can't read this file directly via numpy's loadtxt function:

>>> np.loadtxt('new.txt', dtype=np.unicode_)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/site-packages/numpy/lib/npyio.py", line 805, in loadtxt
    X = np.array(X, dtype)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)

将这个文件直接读入numpy的正确方法是什么?

What is the correct way to read this file into numpy directly?

谢谢.

推荐答案

在内存中，unicode字符串表示为 UCS-4 ，具体取决于您的Python解释器已编译.您的文件使用 UTF-8 进行编码，因此您需要重新编码后才能进行映射它到NumPy数组. loadtxt()无法为您进行重新编码-毕竟NumPy主要针对数字数组.

In memory, unicode strings are represented as UCS-2 or UCS-4, depending on how your Python interpreter was compiled. Your file is encoded in UTF-8, so you need to recode it before you can map it to the NumPy array. loadtxt() can't do the recoding for you -- after all NumPy is mainly targeted at numerical arrays.

假设每行的字符数相同，您也可以使用更有效的变体

Assuming every line has the same number of characters, you could also use the more efficient variant

s = codecs.open("new.txt", encoding="utf-8").read()
arr = numpy.frombuffer(s, dtype="<U3")

这将在字符串中包含换行符.要不包括它们，请使用

This will include the newline characters in the strings. To not include them, use

arr = numpy.frombuffer(s.replace("\n", ""), dtype="<U2")

编辑:如果文件的行长不同，并且希望避免使用中间列表，则可以使用

Edit: If the lines of your file have different lengths and you would like to avoid the intermediate list, you can use

arr = numpy.fromiter(codecs.open("new.txt", encoding="utf-8"), dtype="<U2")

我不确定这是否会在内部创建一些临时列表.

I'm not sure if this will internally create some temporary list, though.

这篇关于将unicode元素读入numpy数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将unicode元素读入numpy数组 [英] Reading unicode elements into numpy array

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将unicode元素读入​​numpy数组 [英] Reading unicode elements into numpy array

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

将unicode元素读入numpy数组 [英] Reading unicode elements into numpy array

登录关闭