如何有效地将一个Pandas数据框的每一列与另一数据框的每一列相乘? [英] How to multiply every column of one Pandas Dataframe with every column of another Dataframe efficiently?

查看:62
本文介绍了如何有效地将一个Pandas数据框的每一列与另一数据框的每一列相乘?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将两个熊猫数据帧彼此相乘.具体来说,我想将每一列与另一个df的每一列相乘.

I'm trying to multiply two pandas dataframes with each other. Specifically, I want to multiply every column with every column of the other df.

数据帧是一键编码的,因此它们看起来像这样:

The dataframes are one-hot encoded, so they look like this:

col_1, col_2, col_3, ...
 0      1      0 
 1      0      0 
 0      0      1 
 ...

我可以使用for循环遍历每列,但是在python中,这在计算上是昂贵的,我希望有一种更简单的方法.

I could just iterate through each of the columns using a for loop, but in python that is computationally expensive, and I'm hoping there's an easier way.

其中一个数据帧具有500列,另一个数据帧具有100列.

One of the dataframes has 500 columns, the other has 100 columns.

这是到目前为止我能写的最快的版本:

This is the fastest version that I've been able to write so far:

interact_pd = pd.DataFrame(index=df_1.index)
df1_columns = [column for column in df_1]
for column in df_2:
    col_pd = df_1[df1_columns].multiply(df_2[column], axis="index")
    interact_pd = interact_pd.join(col_pd, lsuffix='_' + column)

我遍历df_2中的每一列,并将所有df_1乘以该列,然后将结果附加到interact_pd.但是,我宁愿不使用for循环来执行此操作,因为这在计算上非常昂贵.有更快的方法吗?

I iterate over each column in df_2 and multiply all of df_1 by that column, then I append the result to interact_pd. I would rather not do it using a for loop however, as this is very computationally costly. Is there a faster way of doing it?

示例

df_1:

1col_1, 1col_2, 1col_3
 0      1      0 
 1      0      0 
 0      0      1 

df_2:

2col_1, 2col_2
 0      1       
 1      0       
 0      0      

interact_pd:

interact_pd:

1col_1_2col_1, 1col_2_2col_1,1col_3_2col_1, 1col_1_2col_2, 1col_2_2col_2,1col_3_2col_2

  0      0      0        0       1        0  
  1      0      0        0       0        0 
  0      0      0        0       0        0 

推荐答案

# use numpy to get a pair of indices that map out every
# combination of columns from df_1 and columns of df_2
pidx = np.indices((df_1.shape[1], df_2.shape[1])).reshape(2, -1)

# use pandas MultiIndex to create a nice MultiIndex for
# the final output
lcol = pd.MultiIndex.from_product([df_1.columns, df_2.columns],
                                  names=[df_1.columns.name, df_2.columns.name])

# df_1.values[:, pidx[0]] slices df_1 values for every combination
# like wise with df_2.values[:, pidx[1]]
# finally, I marry up the product of arrays with the MultiIndex
pd.DataFrame(df_1.values[:, pidx[0]] * df_2.values[:, pidx[1]],
             columns=lcol)

代码

from string import ascii_letters

df_1 = pd.DataFrame(np.random.randint(0, 2, (1000, 26)), columns=list(ascii_letters[:26]))
df_2 = pd.DataFrame(np.random.randint(0, 2, (1000, 52)), columns=list(ascii_letters))

def pir1(df_1, df_2):
    pidx = np.indices((df_1.shape[1], df_2.shape[1])).reshape(2, -1)

    lcol = pd.MultiIndex.from_product([df_1.columns, df_2.columns],
                                      names=[df_1.columns.name, df_2.columns.name])

    return pd.DataFrame(df_1.values[:, pidx[0]] * df_2.values[:, pidx[1]],
                        columns=lcol)

def Test2(DA,DB):
  MA = DA.as_matrix()
  MB = DB.as_matrix()
  MM = np.zeros((len(MA),len(MA[0])*len(MB[0])))
  Col = []
  for i in range(len(MB[0])):
    for j in range(len(MA[0])):
      MM[:,i*len(MA[0])+j] = MA[:,j]*MB[:,i]
      Col.append('1col_'+str(i+1)+'_2col_'+str(j+1))
  return pd.DataFrame(MM,dtype=int,columns=Col)

结果

这篇关于如何有效地将一个Pandas数据框的每一列与另一数据框的每一列相乘?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆