以类似字典的方式将新项目添加到某些结构化数组中 [英] Add new items to some structured array in a dictionary-like way
问题描述
我想以numpy扩展结构化数组对象,以便可以轻松添加新元素.
例如,对于一个简单的结构化数组
>>> import numpy as np
>>> x=np.ndarray((2,),dtype={'names':['A','B'],'formats':['f8','f8']})
>>> x['A']=[1,2]
>>> x['B']=[3,4]
我想轻松地添加一个新元素x['C']=[5,6]
,但是出现与未定义名称'C'
相关的错误.
只需将新方法添加到np.ndarray
即可:
import numpy as np
class sndarray(np.ndarray):
def column_stack(self,i,x):
formats=['f8']*len(self.dtype.names)
new=sndarray(shape=self.shape,dtype={'names':list(self.dtype.names)+[i],'formats':formats+['f8']})
for key in self.dtype.names:
new[key]=self[key]
new[i]=x
return new
然后
>>> x=sndarray((2,),dtype={'names':['A','B'],'formats':['f8','f8']})
>>> x['A']=[1,2]
>>> x['B']=[3,4]
>>> x=x.column_stack('C',[4,4])
>>> x
sndarray([(1.0, 3.0, 4.0), (2.0, 4.0, 4.0)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
是否可以以类似字典的方式添加新元素?例如
>>> x['C']=[4,4]
>>> x
sndarray([(1.0, 3.0, 4.0), (2.0, 4.0, 4.0)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
更新:
通过使用__setitem__
,我仍然离理想解决方案仅一步之遥,因为我不知道该怎么做:
更改自我引用的对象
import numpy as np
class sdarray(np.ndarray):
def __setitem__(self, i,x):
if i in self.dtype.names:
super(sdarray, self).__setitem__(i,x)
else:
formats=['f8']*len(self.dtype.names)
new=sdarray(shape=self.shape,dtype={'names':list(self.dtype.names)+[i],'formats':formats+['f8']})
for key in self.dtype.names:
new[key]=self[key]
new[i]=x
self.with_new_column=new
然后
>>> x=sndarray((2,),dtype={'names':['A','B'],'formats':['f8','f8']})
>>> x['A']=[1,2]
>>> x['B']=[3,4]
>>> x['C']=[4,4]
>>> x=x.with_new_column #extra uggly step!
>>> x
sndarray([(1.0, 3.0, 4.0), (2.0, 4.0, 4.0)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
更新2
在正确选择答案后,我发现问题已经通过pandas
DataFrame
对象解决了:
>>> import numpy as np
>>> import pandas as pd
>>> x=np.ndarray((2,),dtype={'names':['A','B'],'formats':['f8','f8']})
>>> x=pd.DataFrame(x)
>>> x['A']=[1,2]
>>> x['B']=[3,4]
>>> x['C']=[4,4]
>>> x
A B C
0 1 3 4
1 2 4 4
>>>
使用numpy.recarray
代替,在我的numpy 1.6.1
中,您获得了一个额外的方法field
,当您从numpy.ndarray
进行子类化时,该方法不存在.>
此问题或报告了另一个问题.问题在于,当您执行self=...
时,您只是将new
对象存储在变量中,但是实体sdarray
并未被更新.也许可以从其方法内部直接销毁并重建class
,但是基于 讨论可以创建以下class
,其中ndarray
不是子类,而是在内部存储和调用的.添加了一些其他方法来使其工作并看起来像直接使用ndarray
一样.我没有详细测试.
要自动调整大小,请在此处提供好的解决方案.您也可以将其合并到代码中.
import numpy as np
class sdarray(object):
def __init__(self, *args, **kwargs):
self.recarray = np.recarray( *args, **kwargs)
def __getattr__(self,attr):
if hasattr( self.recarray, attr ):
return getattr( self.recarray, attr )
else:
return getattr( self, attr )
def __len__(self):
return self.recarray.__len__()
def __add__(self,other):
return self.recarray.__add__(other)
def __sub__(self,other):
return self.recarray.__sub__(other)
def __mul__(self,other):
return self.recarray.__mul__(other)
def __rmul__(self,other):
return self.recarray.__rmul__(other)
def __getitem__(self,i):
return self.recarray.__getitem__(i)
def __str__(self):
return self.recarray.__str__()
def __repr__(self):
return self.recarray.__repr__()
def __setitem__(self, i, x):
keys = []
formats = []
if i in self.dtype.names:
self.recarray.__setitem__(i,x)
else:
for name, t in self.dtype.fields.iteritems():
keys.append(name)
formats.append(t[0])
keys.append( i )
formats.append( formats[-1] )
new = np.recarray( shape = self.shape,
dtype = {'names' : keys,
'formats': formats} )
for k in keys[:-1]:
new[k] = self[k]
new[i] = x
self.recarray = new
I want to extend the structured array object in numpy such that I can easily add new elements.
For example, for a simple structured array
>>> import numpy as np
>>> x=np.ndarray((2,),dtype={'names':['A','B'],'formats':['f8','f8']})
>>> x['A']=[1,2]
>>> x['B']=[3,4]
I would like to easily add a new element x['C']=[5,6]
, but then an error appears associated to the undefined name 'C'
.
Just adding a new method to np.ndarray
works:
import numpy as np
class sndarray(np.ndarray):
def column_stack(self,i,x):
formats=['f8']*len(self.dtype.names)
new=sndarray(shape=self.shape,dtype={'names':list(self.dtype.names)+[i],'formats':formats+['f8']})
for key in self.dtype.names:
new[key]=self[key]
new[i]=x
return new
Then,
>>> x=sndarray((2,),dtype={'names':['A','B'],'formats':['f8','f8']})
>>> x['A']=[1,2]
>>> x['B']=[3,4]
>>> x=x.column_stack('C',[4,4])
>>> x
sndarray([(1.0, 3.0, 4.0), (2.0, 4.0, 4.0)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Is there any way that the new element could be added in a dictionary-like way?, e.g
>>> x['C']=[4,4]
>>> x
sndarray([(1.0, 3.0, 4.0), (2.0, 4.0, 4.0)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Update:
By using __setitem__
I am still one step away from the ideal solution because I don't know how:
change the object referenced at self
import numpy as np
class sdarray(np.ndarray):
def __setitem__(self, i,x):
if i in self.dtype.names:
super(sdarray, self).__setitem__(i,x)
else:
formats=['f8']*len(self.dtype.names)
new=sdarray(shape=self.shape,dtype={'names':list(self.dtype.names)+[i],'formats':formats+['f8']})
for key in self.dtype.names:
new[key]=self[key]
new[i]=x
self.with_new_column=new
Then
>>> x=sndarray((2,),dtype={'names':['A','B'],'formats':['f8','f8']})
>>> x['A']=[1,2]
>>> x['B']=[3,4]
>>> x['C']=[4,4]
>>> x=x.with_new_column #extra uggly step!
>>> x
sndarray([(1.0, 3.0, 4.0), (2.0, 4.0, 4.0)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Update 2
After the right implementation in the selected answer, I figure out that the problem is already solved by pandas
DataFrame
object:
>>> import numpy as np
>>> import pandas as pd
>>> x=np.ndarray((2,),dtype={'names':['A','B'],'formats':['f8','f8']})
>>> x=pd.DataFrame(x)
>>> x['A']=[1,2]
>>> x['B']=[3,4]
>>> x['C']=[4,4]
>>> x
A B C
0 1 3 4
1 2 4 4
>>>
Use numpy.recarray
instead, in my numpy 1.6.1
you get an extra method field
that does not exist when you subclass from numpy.ndarray
.
This question or this one (if using numpy 1.3) also discuss adding a field to a structured array
. From there you will see that using:
import numpy.lib.recfunctions as rf
rf.append_fields( ... )
can greatly simplify your life. At the first glance I thought this function would append to the original array, but it creates a new instance instead. The class
shown below is using your solution for __setitem__()
, which is working very well.
The issue you found that led you to the ugly solution was reported in another question. The problem is that when you do self=...
you are just storing the new
object in a variable, but the entity sdarray
is not being updated. Maybe it is possible to directly destroy and reconstruct the class
from inside its method, but based on that discussion the following class
can be created, in which ndarray
is not subclassed, but stored and called internally. Some other methods were added to make it work and look like you are working directly with ndarray
. I did not test it in detail.
For automatic resizing a good solution has been presented here. You can also incorporate in your code.
import numpy as np
class sdarray(object):
def __init__(self, *args, **kwargs):
self.recarray = np.recarray( *args, **kwargs)
def __getattr__(self,attr):
if hasattr( self.recarray, attr ):
return getattr( self.recarray, attr )
else:
return getattr( self, attr )
def __len__(self):
return self.recarray.__len__()
def __add__(self,other):
return self.recarray.__add__(other)
def __sub__(self,other):
return self.recarray.__sub__(other)
def __mul__(self,other):
return self.recarray.__mul__(other)
def __rmul__(self,other):
return self.recarray.__rmul__(other)
def __getitem__(self,i):
return self.recarray.__getitem__(i)
def __str__(self):
return self.recarray.__str__()
def __repr__(self):
return self.recarray.__repr__()
def __setitem__(self, i, x):
keys = []
formats = []
if i in self.dtype.names:
self.recarray.__setitem__(i,x)
else:
for name, t in self.dtype.fields.iteritems():
keys.append(name)
formats.append(t[0])
keys.append( i )
formats.append( formats[-1] )
new = np.recarray( shape = self.shape,
dtype = {'names' : keys,
'formats': formats} )
for k in keys[:-1]:
new[k] = self[k]
new[i] = x
self.recarray = new
这篇关于以类似字典的方式将新项目添加到某些结构化数组中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!