试图从我的Numpy数组中剥离b'' [英] Trying to strip b' ' from my Numpy array

查看:351
本文介绍了试图从我的Numpy数组中剥离b''的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个非常愚蠢的问题.

So I have what I feel is a very dumb problem.

我从文件创建一个数组:

I create an array from a file:

A1=np.loadtxt(file, dtype='a100')

在处理完另一个文件后,我想将该数组写入:

I want to write that array after it's done processing to another file:

np.savetxt("Test.txt", A1, fmt=%s, delimiter=',')

为什么要写出b'string'?我想我知道它是以字节为单位写出来的,但是对于我一生来说,如果没有b'',我不知道如何写出来.

Why is it writing out b'string'? I think I understand it's writing it out as byte but for the life of me I can't figure out how to write it out without the b''.

我知道这很容易被我忽略!

I know this is probably something incredibly easy I'm overlooking!

推荐答案

A1作为字节字符串数组加载. Python3默认使用unicode字符串,因此通常在它们前面加上'b'. print正常.我在文件写入过程中也这样做感到有些惊讶.

A1 is loaded as an array of bytestrings. Python3 used unicode strings as default, so usually prepends them with the 'b'. That's normal with print. I'm a little surprised that it does so also during the file write.

无论如何,这似乎可以解决问题:

In any case, this seems to do the trick:

A2=np.array([x.decode() for x in A1])
np.savetxt("Test.txt", A2, fmt='%s', delimiter=',')

A2将具有与dtype='<U100'类似的dtype.

A2 will have a dtype like dtype='<U100'.

我的测试数组是:

array([b'one.com', b'two.url', b'three.four'], dtype='|S10')

从一个简单的文本文件加载:

loaded from a simple text file:

one.com
two.url
three.four

.decode是字符串方法. [x.decode() for x in A1]适用于简单的一维字节串数组.如果A1为2d,则必须对所有元素(而不仅仅是行)进行迭代.如果A1是结构化数组,则必须将A1应用于元素内的字符串.

.decode is a string method. [x.decode() for x in A1] works for a simple 1d array of bytestrings. If A1 is 2d, the iteration has to be done over all elements, not just the rows. And if A1 is structured array, is has to be applied to the strings within the elements.

另一种可能性是在加载过程中使用转换器,这样您将获得(unicode)字符串数组

Another possibility is to use a converter during load, so you get an array of (unicode) strings

In [508]: A1=np.loadtxt('urls.txt', dtype='U',
    converters={0:lambda x:x.decode()})
In [509]: A1
Out[509]: 
array(['one.com', 'two.url', 'three.four'], dtype='<U10')
In [510]: np.savetxt('test0.txt',A1,fmt='%s')
In [511]: cat test0.txt
one.com
two.url
three.four

包含loadtxt的库具有几个转换器函数asbytesasbytes_nestedasstr.因此converters也可能是:converters={0:np.lib.npyio.asstr}.

The lib that contains loadtxt has a couple of converter functions, asbytes, asbytes_nested, and asstr. So converters could also be: converters={0:np.lib.npyio.asstr}.

genfromtxt在不使用converters的情况下进行处理:

genfromtxt handles this without converters:

 A1=np.genfromtxt('urls.txt', dtype='U')
 # array(['one.com', 'two.url', 'three.four'], dtype='<U10')

要了解为什么savetxt根据需要保存unicode字符串,而将b附加为字节字符串,我们必须查看其代码.

To understand why savetxt save unicode strings as we want, but appends the b for bytestrings, we have to look at its code.

np.savetxt(在py3上运行)本质上是:

np.savetxt (running on py3) is essentially:

fh = open(fname, 'wb')
X = np.atleast_2d(X).T
# make a 'fmt' that matches the columns of X (with delimiters)
for row in X:
    fh.write(asbytes(format % tuple(row) + newline))

查看两个示例字符串(str和bytestr):

Looking at two sample strings (str and bytestr):

In [617]: asbytes('%s'%tuple(['one.two']))
Out[617]: b'one.two'

In [618]: asbytes('%s'%tuple([b'one.two']))
Out[618]: b"b'one.two'"

写入'wb'文件将删除b''的外层,而将内部保留为字节串.它还解释了为什么将字符串("plain" py3 unicode)作为"latin1"字符串写入文件.

Writing to a 'wb' file removes that outer layer of b'', leaving the inner for the bytestring. It also explains why strings ('plain' py3 unicode) are written as 'latin1' strings to the file.

您可以直接编写一个字节字符串数组,而无需savetxt.例如:

You could write a bytestrings array directly, without savetxt. For example:

A0 = array([b'one.com', b'two.url', b'three.four'], dtype='|S10')
with open('test0.txt','wb') as f:
    for x in A0:
        f.write(x+b'\n')

cat test0.txt
    one.com
    two.url
    three.four

Unicode字符串也可以直接编写,从而产生相同的文件:

Unicode strings can also be written directly, producing the same file:

A1 = array(['one.com', 'two.url', 'three.four'], dtype='<U10')
with open('test1.txt','w') as f:
    for x in A1:
        f.write(x+'\n')

此类文件的默认编码为encoding='UTF-8',与'one.com'.encode()所使用的相同.效果与savetxt相同:

The default encoding for such a file is encoding='UTF-8', the same as used with 'one.com'.encode(). The effect it is the same as what savetxt does:

with open('test1.txt','wb') as f:
    for x in A1:
        f.write(x.encode()+b'\n')


np.char具有.encode.decode方法,它们似乎在数组的元素上进行迭代操作.


np.char has .encode and .decode methods, which appear to operate iteratively on the elements of an array.

因此

 np.char.decode(A1)   # convert |S10 to <U10, like [x.decode() for x in A1]
 np.char.encode(A1)   # convert <U10 to |S10

这适用于多维数组

 np.savetxt('testm.txt',np.char.decode(A_bytes[:,None][:,[0,0]]),
     fmt='%s',delimiter=',  ')

对于结构化数组,必须分别将np.char.decode应用于每个char字段.

With a structured array, np.char.decode has to be applied individually to each of the char fields.

这篇关于试图从我的Numpy数组中剥离b''的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆