pandas 交叉表矩阵点nansum [英] Pandas crosstab matrix dot nansum
问题描述
我正在寻找使用类似np.nansum的函数从现有数据帧创建子数据帧的帮助.我想将此表转换为非空列总和的矩阵:
i'm looking for help creating a sub-dataframe from an existing dataframe using a np.nansum-like function. I want to convert this table into a matrix of non-null column sums:
dan ste bob
t1 na 2 na
t2 2 na 1
t3 2 1 na
t4 1 na 2
t5 na 1 2
t6 2 1 na
t7 1 na 2
例如,当"dan"不为null(t-2,3,4,6,7)时,"ste"的总和为2,而"bob"为5. "dan"的总和为4.
For example, when 'dan' is not-null (t-2,3,4,6,7) the sum of 'ste' is 2 and 'bob' is 5. When 'ste' is not-null the sum of 'dan' is 4.
dan ste bob
dan 0 2 5
ste 4 0 2
bob 4 1 0
有什么想法吗?
提前谢谢!
我最终在下面使用了matt函数的修改版本:
I ended up using a modified version of matt's function below:
def nansum_matrix_create(df):
rows = []
for col in list(df.columns.values):
col_sums = df[df[col] != 0].sum()
rows.append(col_sums)
return pd.DataFrame(rows, columns=df.columns, index=df.columns)
推荐答案
假定您的数据框没有大量的列,则此函数应该可以实现您想要的并且性能相当.我已经在各列之间使用for
循环实现了此功能,因此可能会有更高性能/更优雅的解决方案.
Assuming your dataframe doesn't have large number of columns, this function should do what you want and be fairly performant. I have implemented this using for
loop across columns so there may be a more performant / elegant solution out there.
import pandas as pd
# Initialise dataframe
df = {"dan":[pd.np.nan,2,2,1,pd.np.nan,2,1],
"ste":[2,pd.np.nan,1,pd.np.nan,1,1,pd.np.nan],
"bob":[pd.np.nan,1,pd.np.nan,2,2,pd.np.nan,2]}
df = pd.DataFrame(df)[["dan","ste","bob"]]
def matrix_create(df):
rows = []
for col in df.columns:
subvals, index = [], []
for subcol in df.columns:
index.append(subcol)
if subcol == col:
subvals.append(0)
else:
subvals.append(df[~pd.isnull(df[col])][subcol].sum())
rows.append(subvals)
return pd.DataFrame(rows,columns=df.columns,index=index)
matrix_create(df)
这篇关于 pandas 交叉表矩阵点nansum的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!