将每列除以其他每列,并根据结果创建一个新的数据框 [英] Dividing each column by every other column and creating a new dataframe from the results

查看:104
本文介绍了将每列除以其他每列,并根据结果创建一个新的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在aa pandas df(来自csv文件的数据)中,我试图通过将每一列除以其他每一列来添加新列(比率)。

in a a pandas df (data from a csv file) I am trying to add new columns (ratios) by dividing each column by every other column.

到目前为止我陷入了将所有列除以第一个列的过程中(希望稍后进行迭代)。

So far I am stuck in the process of dividing all columns by the first one (hoping to iterate the process later).

ratio_df = df.join(df.div(df['Hv10'], axis=0), rsuffix='_new_ratio')

我从处理类似问题的帖子中获取此代码。

I am getting this code from a post that deals with a similar problem.


pandas数据框会创建新列并填充相同df中的计算值

我收到以下错误消息:

ValueError: operands could not be broadcast together with shapes (235170,) (3618,) 

我不确定为什么收到此错误消息,因为我将每一列除以另一列(因此尺寸应相同)

I am not sure why I am getting this error message since I am dividing each column by another (so dimension should be the same)

我在做什么错了?

是否有一步生成所有这些新比率列的过程?

Is there a one step process to generate all these new ratio columns?

我希望我的描述很清楚。

I hope my description is clear.

谢谢!

推荐答案

您在正确的轨道上,但是联接不是正确的操作。您应该可以使用 pd.concat 来做到这一点。

You're on the right track, but a join is not the right operation. You should be able to do this using pd.concat.

pd.concat([df.div(df[col], axis=0) for col in df.columns], axis=1) # each column with every other column

如果要避免将其自身与列分开,可以使用 df.columns.difference p>

If you want to avoid dividing a column with itself, you could use df.columns.difference:

pd.concat([df[df.columns.difference([col])].div(df[col], axis=0) \
                                       for col in df.columns], axis=1)

您也可以使用 df.add_suffix('_ new_ratio')将后缀添加到列中。

You can also use df.add_suffix('_new_ratio') to add suffixes to your columns.

MCVE:

import pandas as pd
import numpy as np

np.random.seed([3, 14])
df = pd.DataFrame(np.random.randn(10, 3), columns=list('ABC'))

df

          A         B         C
0 -0.602923 -0.402655  0.302329
1 -0.524349  0.543843  0.013135
2 -0.326498  1.385076 -0.132454
3 -0.407863  1.302895 -0.604236
4 -0.243362 -0.211261 -2.056621
5  0.517868 -0.040749 -1.051875
6  0.607092 -2.230437 -0.610389
7  0.223345  0.841994 -1.564391
8  0.031653  0.655489 -0.288834
9 -0.467438  0.119117  1.519430

df_new = pd.concat([df[df.columns.difference([col])].div(df[col], axis=0)\
                           .add_suffix('_n_r') for col in df.columns], axis=1)
df_new

       B_n_r     C_n_r      A_n_r      C_n_r      A_n_r      B_n_r
0   0.667838 -0.501438   1.497369  -0.750838  -1.994263  -1.331845
1  -1.037176 -0.025050  -0.964156   0.024152 -39.919620  41.403685
2  -4.242213  0.405682  -0.235726  -0.095630   2.464987 -10.457000
3  -3.194442  1.481468  -0.313044  -0.463764   0.675006  -2.156269
4   0.868095  8.450867   1.151948   9.734958   0.118331   0.102723
5  -0.078686 -2.031166 -12.708707  25.813488  -0.492328   0.038739
6  -3.673971 -1.005432  -0.272185   0.273663  -0.994598   3.654123
7   3.769924 -7.004363   0.265257  -1.857959  -0.142768  -0.538225
8  20.708576 -9.125012   0.048289  -0.440639  -0.109589  -2.269430
9  -0.254830 -3.250547  -3.924192  12.755771  -0.307641   0.078396

这篇关于将每列除以其他每列,并根据结果创建一个新的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆