从列表列表创建具有各种数据类型的numpy数组 [英] Creating a numpy array from list of lists, with various data types

查看:287
本文介绍了从列表列表创建具有各种数据类型的numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用列表列表创建一个numpy数组.数据类型应为float, float, string. 为什么行不通?(注意:我已经阅读了这个问题).. >

I'd like to create a numpy array with a list of lists. The data type should be float, float, string. Why doesn't this work? (Note: I already read this question).

import numpy

print numpy.array([[u'1.2', u'1.3', u'hello'], [u'1.4', u'1.5', u'hi']], dtype='f,f,str')

输出:

[[(4.2245014868923476e-39, 7.006492321624085e-44, '')
  (4.2245014868923476e-39, 7.146622168056567e-44, '')
  (9.275530846997402e-39, 9.918384925297198e-39, '')]
 [(4.2245014868923476e-39, 7.286752014489049e-44, '')
  (4.2245014868923476e-39, 7.42688186092153e-44, '')
  (9.642872831629367e-39, 0.0, '')]]

推荐答案

正如我先前的答案和评论所强调的,复合dtype的常规输入是一个元组列表.坦率地说,这就是np.array设计工作的方式.

As stressed in my previous answer, and comments, the normal input for a compound dtype is a list of tuples. To put it bluntly, that's how np.array is designed to work.

In [308]: numpy.array([[u'1.2', u'1.3', u'hello'], [u'1.4', u'1.5', u'hi']], dtype='f,f,str')
TypeError: a bytes-like object is required, not 'str'

具有元组列表和改进的dtype:

With a list of tuples, and an improved dtype:

In [311]: numpy.array([(u'1.2', u'1.3', u'hello'), (u'1.4', u'1.5', u'hi')], dtype='f8,f8,U10')
Out[311]: 
array([( 1.2,  1.3, 'hello'), ( 1.4,  1.5, 'hi')],
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<U10')])


绕过普通元组列表的一种可能方法(我现在无法对其进行测试):


A possible way around the normal list of tuples (I can't test it right now):

Make a zeros array of the right shape and dtype
Make an object array from the list of lists (or a 2d array of strings)
Assign columns of the 2d array to fields of the structured (a loop)

在少数几个字段上循环通常比在许多记录上循环更快.

Looping on the few fields is usually faster than looping on the many records.

但是,将列表列表转换为元组列表并不那么昂贵.

But, converting a list of lists into a list of tuples shouldn't be that expensive.

In [314]: alist = [[u'1.2', u'1.3', u'hello'], [u'1.4', u'1.5', u'hi']]
In [316]: dt = np.dtype('f8,f8,U10')

使用元组列表设置:

In [317]: np.array([tuple(a) for a in alist], dtype=dt)
Out[317]: 
array([( 1.2,  1.3, 'hello'), ( 1.4,  1.5, 'hi')],
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<U10')])

设置字段:

In [319]: res = np.zeros(len(alist), dtype=dt)
In [320]: temp = np.array(alist)    
In [321]: temp                    # default string dtype
Out[321]: 
array([['1.2', '1.3', 'hello'],
       ['1.4', '1.5', 'hi']],
      dtype='<U5')
In [322]: for i,n in enumerate(dt.names):
     ...:     res[n] = temp[:,i]
     ...:     
In [323]: res
Out[323]: 
array([( 1.2,  1.3, 'hello'), ( 1.4,  1.5, 'hi')],
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<U10')])


对于这种小情况,元组列表方法更快.如果字段更长,则字段可能会更快,但必须进行测试


For this small case, the list of tuples approach is faster. With a much longer one the fields might be faster, but it has to be tested

In [325]: timeit np.array([tuple(a) for a in alist], dtype=dt)
6.26 µs ± 6.28 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [326]: %%timeit
     ...: res = np.zeros(len(alist), dtype=dt)
     ...: temp = np.array(alist)
     ...: for i,n in enumerate(dt.names):
     ...:     res[n] = temp[:,i]
     ...: 
18.2 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

但是即使有很多行,元组转换也更快:

But even with many rows, tuple conversion is faster:

In [334]: arr = np.random.randint(0,100,(100000,3)).astype('U10')
In [335]: alist = arr.tolist()
In [336]: timeit np.array([tuple(a) for a in alist], dtype=dt)
93.5 ms ± 322 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [337]: %%timeit
     ...: res = np.zeros(len(alist), dtype=dt)
     ...: temp = np.array(alist)
     ...: for i,n in enumerate(dt.names):
     ...:     res[n] = temp[:,i]
     ...: 
124 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

将元组理解拉出定时循环可以节省一些时间:

Pulling the tuple comprehension out of the timing loop saves some time:

In [341]: %%timeit temp = [tuple(a) for a in alist]
     ...: np.array(temp, dtype=dt)
     ...: 
65.4 ms ± 98.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

拉开str数组创建的时间:

Pulling the str array creation out of the timing:

In [342]: %%timeit temp = np.array(alist)
     ...: res = np.zeros(len(alist), dtype=dt)
     ...: for i,n in enumerate(dt.names):
     ...:     res[n] = temp[:,i]
     ...: 
71 ms ± 447 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

仅从列表中创建字符串数组比元组转换要昂贵.

Simply creating an string array from the list is more expensive than the tuple conversion.

这篇关于从列表列表创建具有各种数据类型的numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆