如何在 pandas 数据框上制作一个矩形矩阵正方形 [英] How to make a rectangular matrix square on pandas dataframe
问题描述
我有以下形式的矩阵(不一定是正方形):
I have a matrix on the following form (not necessarily square):
A B C D
A 0 0.2 0.3 0.5
E 0.2 0.6 0.9 0.2
D 0.5 0.3 0.6 0
F 0.1 0.4 0.5 0.3
我想将其转换为方阵,如下所示:
And I would like to turn it into a square matrix as follows
A B C D E F
A 0 0.2 0.3 0.5 0.2 0.1
B 0.2 0 0 0.3 0.6 0.4
C 0.3 0 0 0.6 0.9 0.5
D 0.5 0.3 0.6 0 0.2 0.3
E 0.2 0.6 0.9 0.2 0 0
F 0.1 0.4 0.5 0.3 0 0
换句话说,我想同时扩展行和列,以便它是一个对称的方矩阵(行和列的顺序相同),缺失的值填充为0.
In other words, I would like to expand both rows and columns so that it is a symmetric square matrix (rows and columns are in the same order) and missing values are filled with 0.
我猜想应该有一种方法可以使用熊猫的内置函数轻松/有效地做到这一点,但我对该软件包并不熟悉.
I guessed there should be a way to do this easily/efficiently using built in functions of pandas but I am not familiar with the package.
为方便起见:
df = pd.DataFrame([[0, 0.2, 0.3, 0.5],
[0.2, 0.6, 0.9, 0.2],
[0.5, 0.3, 0.6, 0],
[0.1, 0.4, 0.5, 0.3]],
index=['A', 'E', 'D', 'F'],
columns=['A', 'B', 'C', 'D'])
推荐答案
就像您以为您可以在熊猫中简洁地做到这一点一样.
Just as you thought you can definitely do this pretty concisely in Pandas.
一种方法是使用非常漂亮的 combine_first 方法.
One way is by using the very nice combine_first method.
result = df.combine_first(df.T).fillna(0.0)
但是,在我的测试中,使用了 timeit 的时钟是3.62 ms每个循环±29.2 µs,实际上比我您的方法的时间稍慢(每个循环3.5 ms±28.6 µs ).
However, in my testing using timeit that clocked in at 3.62 ms ± 29.2 µs per loop which was actually slightly slower than the time I got for your method (3.5 ms ± 28.6 µs per loop).
However, by calculating this more directly in Pandas using the update method I was able to get this down to 2.04 ms ± 17.2 µs per loop µs per loop (~1.7x as fast).
# Find the combination of both indices
full_index = df.index.union(df.columns)
# Resize the DataFrame to include all the rows and columns
all_data = df.reindex(labels=full_index, axis=0).reindex(labels=full_index, axis=1)
# Update any values we have from the transpose
all_data.update(all_data.T)
# Fill the missing entries
result = all_data.fillna(0.0)
老实说,我并没有获得我想像的那样多的性能提升,但是至少两个基于熊猫的版本至少对我来说更具可读性.
Honestly I wasn't able to get as much of a performance improvement as I thought I might, but both pandas based versions are a little more readable to me at least.
这篇关于如何在 pandas 数据框上制作一个矩形矩阵正方形的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!