如何有效地将一个Pandas数据框的每一列与另一数据框的每一列相乘? [英] How to multiply every column of one Pandas Dataframe with every column of another Dataframe efficiently?
问题描述
我正在尝试将两个熊猫数据帧彼此相乘.具体来说,我想将每一列与另一个df的每一列相乘.
I'm trying to multiply two pandas dataframes with each other. Specifically, I want to multiply every column with every column of the other df.
数据帧是一键编码的,因此它们看起来像这样:
The dataframes are one-hot encoded, so they look like this:
col_1, col_2, col_3, ...
0 1 0
1 0 0
0 0 1
...
我可以使用for循环遍历每列,但是在python中,这在计算上是昂贵的,我希望有一种更简单的方法.
I could just iterate through each of the columns using a for loop, but in python that is computationally expensive, and I'm hoping there's an easier way.
其中一个数据帧具有500列,另一个数据帧具有100列.
One of the dataframes has 500 columns, the other has 100 columns.
这是到目前为止我能写的最快的版本:
This is the fastest version that I've been able to write so far:
interact_pd = pd.DataFrame(index=df_1.index)
df1_columns = [column for column in df_1]
for column in df_2:
col_pd = df_1[df1_columns].multiply(df_2[column], axis="index")
interact_pd = interact_pd.join(col_pd, lsuffix='_' + column)
我遍历df_2中的每一列,并将所有df_1乘以该列,然后将结果附加到interact_pd.但是,我宁愿不使用for循环来执行此操作,因为这在计算上非常昂贵.有更快的方法吗?
I iterate over each column in df_2 and multiply all of df_1 by that column, then I append the result to interact_pd. I would rather not do it using a for loop however, as this is very computationally costly. Is there a faster way of doing it?
示例
df_1:
1col_1, 1col_2, 1col_3
0 1 0
1 0 0
0 0 1
df_2:
2col_1, 2col_2
0 1
1 0
0 0
interact_pd:
interact_pd:
1col_1_2col_1, 1col_2_2col_1,1col_3_2col_1, 1col_1_2col_2, 1col_2_2col_2,1col_3_2col_2
0 0 0 0 1 0
1 0 0 0 0 0
0 0 0 0 0 0
推荐答案
# use numpy to get a pair of indices that map out every
# combination of columns from df_1 and columns of df_2
pidx = np.indices((df_1.shape[1], df_2.shape[1])).reshape(2, -1)
# use pandas MultiIndex to create a nice MultiIndex for
# the final output
lcol = pd.MultiIndex.from_product([df_1.columns, df_2.columns],
names=[df_1.columns.name, df_2.columns.name])
# df_1.values[:, pidx[0]] slices df_1 values for every combination
# like wise with df_2.values[:, pidx[1]]
# finally, I marry up the product of arrays with the MultiIndex
pd.DataFrame(df_1.values[:, pidx[0]] * df_2.values[:, pidx[1]],
columns=lcol)
代码
from string import ascii_letters
df_1 = pd.DataFrame(np.random.randint(0, 2, (1000, 26)), columns=list(ascii_letters[:26]))
df_2 = pd.DataFrame(np.random.randint(0, 2, (1000, 52)), columns=list(ascii_letters))
def pir1(df_1, df_2):
pidx = np.indices((df_1.shape[1], df_2.shape[1])).reshape(2, -1)
lcol = pd.MultiIndex.from_product([df_1.columns, df_2.columns],
names=[df_1.columns.name, df_2.columns.name])
return pd.DataFrame(df_1.values[:, pidx[0]] * df_2.values[:, pidx[1]],
columns=lcol)
def Test2(DA,DB):
MA = DA.as_matrix()
MB = DB.as_matrix()
MM = np.zeros((len(MA),len(MA[0])*len(MB[0])))
Col = []
for i in range(len(MB[0])):
for j in range(len(MA[0])):
MM[:,i*len(MA[0])+j] = MA[:,j]*MB[:,i]
Col.append('1col_'+str(i+1)+'_2col_'+str(j+1))
return pd.DataFrame(MM,dtype=int,columns=Col)
结果
这篇关于如何有效地将一个Pandas数据框的每一列与另一数据框的每一列相乘?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!