数据框和序列之间按行的按元素划分 [英] Element-wise division by rows between dataframe and series

查看:129
本文介绍了数据框和序列之间按行的按元素划分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

几周前我刚开始使用熊猫,现在我正尝试对行执行按元素划分,但无法找出实现它的正确方法.这是我的情况和数据

I've just started with pandas some weeks ago and now I am trying to perform an element-wise division on rows, but couldn't figure out the proper way to achieve it. Here is my case and data

          date  type    id     ...            1096        1097        1098
0   2014-06-13   cal     1     ...       17.949524   16.247619   15.465079
1   2014-06-13   cow    32     ...        0.523429   -0.854286   -1.520952
2   2014-06-13   cow    47     ...        7.676000    6.521714    5.892381
3   2014-06-13   cow   107     ...        4.161714    3.048571    2.419048
4   2014-06-13   cow   137     ...        3.781143    2.557143    1.931429
5   2014-06-13   cow   255     ...        3.847273    2.509091    1.804329
6   2014-06-13   cow   609     ...        6.097714    4.837714    4.249524
7   2014-06-13   cow   721     ...        3.653143    2.358286    1.633333
8   2014-06-13   cow   817     ...        6.044571    4.934286    4.373333
9   2014-06-13   cow   837     ...        9.649714    8.511429    7.884762
10  2014-06-13   cow   980     ...        1.817143    0.536571   -0.102857
11  2014-06-13   cow  1730     ...        8.512571    7.114286    6.319048
12  2014-06-13  dark     1     ...      168.725714  167.885715  167.600001

my_data.columns
Index(['date', 'type', 'id', '188', '189', '190', '191', '192', '193', '194',
       ...
       '1089', '1090', '1091', '1092', '1093', '1094', '1095', '1096', '1097',
       '1098'],
      dtype='object', length=914)

我的目标是用"type" == "cal"将所有行除以行,但从列'188'到列'1098'(911列)

My goal is to divide all the rows by the row with "type" == "cal", but from the column '188' to the column '1098' (911 columns)

这些是我尝试过的方法:

These are the approaches I have tried:

提取感兴趣的行,并将其与apply(),divide()和 运算符"/":

Extracting the row of interest and using it with apply(), divide() and operator '/':

>>> cal_r = my_data[my_data["type"]=="cal"].iloc[:,3:]
my_data.apply(lambda x: x.iloc[3:]/cal_r, axis=1)
0       188 189 190 191 192 193 194 195 ...  1091 10...
1          188      189      190    ...           10...
2           188      189      190    ...         109...
3           188      189      190   ...         1096...
4          188      189   190      191   ...        ...
5            188      189      190    ...         10...
6           188      189      190    ...         109...
7          188      189      190    ...         1096...
8          188      189      190    ...         1096...
9          188      189  190    ...         1096    ...
10          188      189      190     ...          1...
11          188      189      190    ...         109...
12         188      189      190      191   ...     ...
dtype: object

>>> mydata.apply(lambda x: x.iloc[3:].divide(cal_r,axis=1), axis=1)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 6014, in apply
    return op.get_result()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 142, in get_result
    return self.apply_standard()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 248, in apply_standard
    self.apply_series_generator()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 277, in apply_series_generator
    results[i] = self.f(v)
  File "<input>", line 1, in <lambda>
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/ops.py", line 1375, in flex_wrapper
    self._get_axis_number(axis)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 375, in _get_axis_number
    .format(axis, type(self)))
ValueError: ("No axis named 1 for object type <class 'pandas.core.series.Series'>", 'occurred at index 0')

不使用套用:

>>> my_data.iloc[:,3:].divide(cal_r)
    188  189  190  191  192  193  ...   1093  1094  1095  1096  1097  1098
0   1.0  1.0  1.0  1.0  1.0  1.0  ...    1.0   1.0   1.0   1.0   1.0   1.0
1   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
2   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
3   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
4   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
5   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
6   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
7   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
8   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
9   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
10  NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
11  NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
12  NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN

命令my_data.iloc[:,3:].divide(cal_r, axis=1)my_data.iloc[:,3:]/cal_r给出相同的结果,只划分第一行.

The commands my_data.iloc[:,3:].divide(cal_r, axis=1) and my_data.iloc[:,3:]/cal_r give the same result, divides just the first row.

如果我只选择一行,那就做得很好:

If I select just one row, it is done well:

my_data.iloc[5,3:]/cal_r
       188      189      190    ...         1096      1097      1098
0  48.8182  48.8274  22.4476    ...     0.214338  0.154428  0.116671

[1 rows x 911 columns]

我缺少一些基本的东西吗?我怀疑我需要复制cal_r行的全部数据的行数相同.

Is there something basic I am missing? I suspect that I will need to replicate the cal_r row the same number of rows of the whole data.

任何提示或指导都非常感谢.

Any hint or guidance is really appreciated.

相关:将熊猫数据框元素除以最大行数

推荐答案

我相信您需要将Series转换为numpy数组,以除以1d数组:

I believe you need convert Series to numpy array for divide by 1d array:

cal_r = my_data.iloc[(my_data["type"]=="cal").values, 3:]
print (cal_r)
        1096       1097       1098
0  17.949524  16.247619  15.465079

my_data.iloc[:, 3:] /= cal_r.values
print (my_data)
          date  type    id      1096       1097       1098
0   2014-06-13   cal     1  1.000000   1.000000   1.000000
1   2014-06-13   cow    32  0.029161  -0.052579  -0.098348
2   2014-06-13   cow    47  0.427644   0.401395   0.381012
3   2014-06-13   cow   107  0.231857   0.187632   0.156420
4   2014-06-13   cow   137  0.210654   0.157386   0.124890
5   2014-06-13   cow   255  0.214338   0.154428   0.116671
6   2014-06-13   cow   609  0.339715   0.297749   0.274782
7   2014-06-13   cow   721  0.203523   0.145147   0.105614
8   2014-06-14   cow   817  0.336754   0.303693   0.282788
9   2014-06-14   cow   837  0.537603   0.523857   0.509843
10  2014-06-14   cow   980  0.101236   0.033025  -0.006651
11  2014-06-14   cow  1730  0.474251   0.437866   0.408601
12  2014-06-14  dark     1  9.400010  10.332943  10.837319

或通过DataFrame转换为Series "nofollow noreferrer"> DataFrame.squeeze 或按位置选择第一行到Series:

Or convert one row DataFrame to Series by DataFrame.squeeze or select first row by position to Series:

my_data.iloc[:, 3:] = my_data.iloc[:, 3:].div(cal_r.squeeze())
#alternative
#my_data.iloc[:, 3:] = my_data.iloc[:, 3:].div(cal_r.iloc[0])
print (my_data)
          date  type    id      1096       1097       1098
0   2014-06-13   cal     1  1.000000   1.000000   1.000000
1   2014-06-13   cow    32  0.029161  -0.052579  -0.098348
2   2014-06-13   cow    47  0.427644   0.401395   0.381012
3   2014-06-13   cow   107  0.231857   0.187632   0.156420
4   2014-06-13   cow   137  0.210654   0.157386   0.124890
5   2014-06-13   cow   255  0.214338   0.154428   0.116671
6   2014-06-13   cow   609  0.339715   0.297749   0.274782
7   2014-06-13   cow   721  0.203523   0.145147   0.105614
8   2014-06-14   cow   817  0.336754   0.303693   0.282788
9   2014-06-14   cow   837  0.537603   0.523857   0.509843
10  2014-06-14   cow   980  0.101236   0.033025  -0.006651
11  2014-06-14   cow  1730  0.474251   0.437866   0.408601
12  2014-06-14  dark     1  9.400010  10.332943  10.837319

这篇关于数据框和序列之间按行的按元素划分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆