过滤元组的numpy数组 [英] Filter numpy array of tuples

查看:105
本文介绍了过滤元组的numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Scikit-learn库是一个很好的数据集群示例-股票市场结构.在美国股票中,它运作良好.但是,当人们添加其他市场的报价时,numpy的错误似乎出现了,数组应该具有相同的大小-的确如此,例如,德国股票具有不同的交易日历.

Scikit-learn library have a brilliant example of data clustering - stock market structure. It works fine within US stocks. But when one adds tickers from other markets, numpy's error appear that arrays shoud have the same size - this is true, for example, german stocks have different trading calendar.

好吧,在下载引号之后,我添加了共享日期的准备:

Ok, after quotes download I add preparation of shared dates:

quotes = [quotes_historical_yahoo_ochl(symbol, d1, d2, asobject=True)
          for symbol in symbols]


def intersect(list_1, list_2):
    return list(set(list_1) & set(list_2))

dates_all = quotes[0].date
for q in quotes:
    dates_symbol = q.date
    dates_all = intersect(dates_all, dates_symbol)

然后,我被困在过滤元组的numpy数组中.这里有一些尝试:

Then I'm stuck with filtering numpy array of tuples. Here's some tries:

# for index, q in enumerate(quotes):
#     filtered = [i for i in q if i.date in dates_all]

#     quotes[index] = np.rec.array(filtered, dtype=q.dtype)
#     quotes[index] = np.asanyarray(filtered, dtype=q.dtype)
#
#     quotes[index] = np.where(a.date in dates_all for a in q)
#
#     quotes[index] = np.where(q[0].date in dates_all)

如何将过滤器应用于numpy数组或如何将记录列表(在过滤器之后)真正转换回numpyrecarray?

How to apply filter to numpy array or how to truly convert list of records (after filter) back to numpy's recarray?

quotes [0] .dtype:

quotes[0].dtype:

'(numpy.record, [('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 'i1'), ('d', '<f8'), ('open', '<f8'), ('close', '<f8'), ('high', '<f8'), ('low', '<f8'), ('volume', '<f8'), ('aclose', '<f8')])'

quotes [0] .shape:

quotes[0].shape:

<class 'tuple'>: (261,)

推荐答案

所以quotes是Recarray的列表,在date_all中,您收集了date字段中所有值的交集.

So quotes is a list of recarrays, and in date_all you collect the intersection of all values in the date field.

我可以使用以下方法重新创建一个这样的数组:

I can recreate one such array with:

In [286]: dt=np.dtype([('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 
     ...:
     ...: ), ('low', '<f8'), ('volume', '<f8'), ('aclose', '<f8')])
In [287]: 
In [287]: arr=np.ones((2,), dtype=dt)  # 2 element structured array
In [288]: arr
Out[288]: 
array([(1, 1, 1, 1,  1.,  1.,  1.,  1.,  1.,  1.,  1.),
       (1, 1, 1, 1,  1.,  1.,  1.,  1.,  1.,  1.,  1.)], 
      dtype=[('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 'i1'), ... ('aclose', '<f8')])
In [289]: type(arr[0])
Out[289]: numpy.void

将其转换为Recarray(我不像普通结构化数组那样使用它们):

turn that into a recarray (I dont' use those as much as plain structured arrays):

In [291]: np.rec.array(arr)
Out[291]: 
rec.array([(1, 1, 1, 1,  1.,  1.,  1.,  1.,  1.,  1.,  1.),
 (1, 1, 1, 1,  1.,  1.,  1.,  1.,  1.,  1.,  1.)], 
          dtype=[('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 'i1'), .... ('aclose', '<f8')])

Recarray的

dtype显示略有不同:

dtype of the recarray displays slightly different:

In [292]: _.dtype
Out[292]: dtype((numpy.record, [('date', 'O'), ('year', '<i2'), ('month', 'i1'), ....('aclose', '<f8')]))
In [293]: __.date
Out[293]: array([1, 1], dtype=object)

在任何情况下,date字段都是对象数组,可能是datetime?

In any case the date field is an array of objects, possibly of datetime?

q是这些数组之一; i是元素,i.date是日期字段.

q is one of these arrays; i is an element, and i.date is the date field.

 [i for i in q if i.date in dates_all]

所以filtered是recarray元素的列表. np.stack将它们重新组装成数组(也可以与recarray一起使用)做得更好.

So filtered is list of recarray elements. np.stack does a better job of reassembling them into an array (that works with the recarray too).

np.stack([i for i in arr if i['date'] in alist])

或者您可以收集匹配记录的索引,并为报价单数组建立索引

Or you could collect the indices of the matching records, and index the quote array

In [319]: [i for i,v in enumerate(arr) if v['date'] in alist]
Out[319]: [0, 1]
In [320]: arr[_]

或先拉出日期字段:

In [321]: [i for i,v in enumerate(arr['date']) if v in alist]
Out[321]: [0, 1]

in1d可能也可以搜索

In [322]: np.in1d(arr['date'],alist)
Out[322]: array([ True,  True], dtype=bool)
In [323]: np.where(np.in1d(arr['date'],alist))
Out[323]: (array([0, 1], dtype=int32),)

这篇关于过滤元组的numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆