将pandas DataFrame中的对角三角形设置为NaN [英] Set diagonal triangle in pandas DataFrame to NaN

查看:151
本文介绍了将pandas DataFrame中的对角三角形设置为NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出以下数据框:

import pandas as pd
import numpy as np
a = np.arange(16).reshape(4, 4)
df = pd.DataFrame(data=a, columns=['a','b','c','d'])

我想产生以下结果:

df([[ NaN,  1,  2,  3],
    [ NaN,  NaN,  6,  7],
    [ NaN,  NaN,  NaN, 11],
    [ NaN,  NaN,  NaN,  NaN]])

到目前为止,我已经尝试使用np.tril_indicies,但是它仅适用于将df转换为numpy数组的情况,并且仅适用于整数赋值(不适用于np.nan):

So far I've tried using np.tril_indicies, but it only works with a df turned back into a numpy array, and it only works for integer assignments (not np.nan):

il1 = np.tril_indices(4)
a[il1] = 0

给予:

array([[ 0,  1,  2,  3],
       [ 0,  0,  6,  7],
       [ 0,  0,  0, 11],
       [ 0,  0,  0,  0]])

...这几乎是我在寻找的东西,但是在分配NaN时遇到了麻烦:

...which is almost what I'm looking for, but barfs at assigning NaN:

ValueError: cannot convert float NaN to integer

同时:

df[il1] = 0

给予:

TypeError: unhashable type: 'numpy.ndarray'

因此,如果我想用NaN填充数据框的底部三角形,是否必须1)必须是一个numpy数组,或者我可以直接使用pandas来做到这一点? 2)是否有一种方法可以用NaN填充底部三角形,而不是使用numpy.fill_diagonal并在整个DataFrame中逐行递增偏移量?

So if I want to fill the bottom triangle of a dataframe with NaN, does it 1) have to be a numpy array, or can I do this with pandas directly? And 2) Is there a way to fill bottom triangle with NaN rather than using numpy.fill_diagonal and incrementing the offset row by row down the whole DataFrame?

另一个失败的解决方案: 用零填充np数组的对角线,然后在零处屏蔽并重新分配给np.nan.当应将其保留为零时,它将对角线上方的零值转换为NaN!

Another failed solution: Filling the diagonal of np array with zeros, then masking on zero and reassigning to np.nan. It converts zero values above the diagonal as NaN when they should be preserved as zero!

推荐答案

您需要强制转换为float a,因为NaN中的typefloat:

You need cast to float a, because type of NaN is float:

import numpy as np
a = np.arange(16).reshape(4, 4).astype(float)
print (a)
[[  0.   1.   2.   3.]
 [  4.   5.   6.   7.]
 [  8.   9.  10.  11.]
 [ 12.  13.  14.  15.]]


il1 = np.tril_indices(4)
a[il1] = np.nan
print (a)
[[ nan   1.   2.   3.]
 [ nan  nan   6.   7.]
 [ nan  nan  nan  11.]
 [ nan  nan  nan  nan]]

df = pd.DataFrame(data=a, columns=['a','b','c','d'])
print (df)
    a    b    c     d
0 NaN  1.0  2.0   3.0
1 NaN  NaN  6.0   7.0
2 NaN  NaN  NaN  11.0
3 NaN  NaN  NaN   NaN

这篇关于将pandas DataFrame中的对角三角形设置为NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆