充满了numpy.zero但得到了nan [英] Filled with numpy.zero but getting nan instead

查看:96
本文介绍了充满了numpy.zero但得到了nan的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含各种方法的类,包括以下内容:

I have a Class that contains various methods with include the following:

def _doc_mean(self, doc):
    doc_vector_values = []
    for w in doc:
        #print(w)
        if w.lower().strip() in self._E:
            Q = np.zeros((1, 200), dtype=np.float64)   #this is a zero array for when a word doesnt have a vector representation in our pretrained embeddings
            doc_vector_values.append(self._E.get(w, Q))

        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=RuntimeWarning)
    return np.mean(np.array(doc_vector_values, dtype=np.float64), axis=0)

def fit(self, X, y=None):
    return self

def transform(self, X):
    return np.array([self._doc_mean(doc) for doc in X])

def fit_transform(self, X, y=None):
    return self.fit(X).transform(X)

在_doc_mean中,我将w与字典E_中的键进行比较,如果匹配,则将包含1 * 200向量的键值对的值加载到列表中,如果不匹配,然后将numpy.zeros((1,200))加载到列表中.现在,此列表将转换为数组并计算平均值.

in _doc_mean, i compare w with the keys in a dictionary E_, if there is a match, then load the value of the key-value pair which contains a 1*200 vector into a list, if there is no match, then load numpy.zeros((1,200)) into a list. This list is now converted to an array and the mean calculated.

当我实例化该类并适合转换我的'doc'数据时

When i instantiate the class and fit-transform my 'doc' data

mc = MeanClass()        
X_ = mc.fit_transform(doc)

X_的类型为对象",并且将不匹配的位置替换为nan而不是numpy.zero.

X_ is of dtype "object" and the places where there was a mismatch was replaced with nan instead of numpy.zero.

这导致我的代码无法解决的其他多个问题.我在做什么错了?

This leads to multiple other problems in my code that i cant fix. What am i doing wrong?

E_字典看起来像这样:

The E_ dictionary looks like this :

{'hello': array([ 5.84850e-02,  6.20640e-02, ..... -2.08990e-02])
'good':  array([ -4.80050e-02,  2.80610e-02, ..... -5.04991e-02])

而doc看起来像这样:

while doc looks like this :

['hello', 'bye', 'good']
['good', 'bye', 'night']

推荐答案

由于您没有给出[mcve],因此我将创建一些简单的内容:

Since you haven't given a [mcve], I'll create something simple:

In [125]: E_ = {'foo':np.arange(5), 'bar':np.arange(1,6), 'baz':np.arange(5,10)}                             
In [126]: doc = ['foo','bar','sub','baz','foo']    

现在进行字典查找:

In [127]: alist = []                                                                                         
In [128]: for w in doc: 
     ...:     alist.append(E_.get(w,np.zeros((1,5),int))) 
     ...:                                                                                                    
In [129]: alist                                                                                              
Out[129]: 
[array([0, 1, 2, 3, 4]),
 array([1, 2, 3, 4, 5]),
 array([[0, 0, 0, 0, 0]]),
 array([5, 6, 7, 8, 9]),
 array([0, 1, 2, 3, 4])]
In [130]: np.array(alist)                                                                                    
Out[130]: 
array([array([0, 1, 2, 3, 4]), array([1, 2, 3, 4, 5]),
       array([[0, 0, 0, 0, 0]]), array([5, 6, 7, 8, 9]),
       array([0, 1, 2, 3, 4])], dtype=object)

E_中的数组都是形状(5,). 填充"数组为(1,5).由于形状不匹配,Out[130]数组是1d对象.

The arrays in E_ are all shape (5,). The 'fill' array is (1,5). Due to the mismatch in shapes, the Out[130] array is 1d object.

我认为您正在尝试避免出现填充"情况,但是您先测试w.lower().strip() in self._E,然后在get中使用w.因此,有时您可能会得到Q值.我用'sub'字符串得到它.

I think you are trying to avoid the 'fill' case, but you test w.lower().strip() in self._E, and then use w in the get. So you might get the Q value sometimes. I got it with the 'sub' string.

如果相反,我将填充"设为(5,):

If instead I make the 'fill' be (5,):

In [131]: alist = []                                                                                         
In [132]: for w in doc: 
     ...:     alist.append(E_.get(w,np.zeros((5,),int))) 
     ...:                                                                                                    
In [133]: alist                                                                                              
Out[133]: 
[array([0, 1, 2, 3, 4]),
 array([1, 2, 3, 4, 5]),
 array([0, 0, 0, 0, 0]),
 array([5, 6, 7, 8, 9]),
 array([0, 1, 2, 3, 4])]
In [134]: np.array(alist)                                                                                    
Out[134]: 
array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [0, 0, 0, 0, 0],
       [5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4]])

结果是一个(n,5)个数字数组.

The result is a (n,5) numeric array.

我可以采取两种不同的方式.一个是所有单词的均值,每个属性"都有一个值.另一个是每个单词的均值,我也可以通过取E_中的mean来获得.

I can take two different means. One is the mean across all words, with a value for each 'attribute'. The other is the mean for each word, which I could just as well have gotten by taking the mean in E_.

In [135]: np.mean(_, axis=0)                                                                                 
Out[135]: array([1.2, 2. , 2.8, 3.6, 4.4])
In [137]: np.mean(__, axis=1)                                                                                
Out[137]: array([2., 3., 0., 7., 2.])   # mean for each 'word'

Out[130]中对象数组的

mean:

In [138]: np.mean(_130, axis=0)                                                                              
Out[138]: array([[1, 2, 2, 3, 4]])

结果是(1,5),看起来像Out[135]被截断了,但是我必须进一步挖掘才能确定.

The result is (1,5) and looks like Out[135] truncated, but I'd have to dig a bit further to be sure.

希望这使您对需要注意的事情有所了解.还有一种我们认为最有用的最小可复制的具体示例".

Hopefully this gives you an idea of what to watch out for. And an idea of the kind of 'minimal reproducable concrete example' that we find most useful.

这篇关于充满了numpy.zero但得到了nan的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆