如何根据条件将大 pandas 数据框中某个范围内的值替换为同一数据框中的另一个值 [英] How to replace values in a range in a pandas dataframe with another value in the same dataframe based on a condition

查看:143
本文介绍了如何根据条件将大 pandas 数据框中某个范围内的值替换为同一数据框中的另一个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果该范围中的值大于零,我想用另一列中的相应值替换数据框的列范围内的值.

I want to replace values within a range of columns in a dataframe with a corresponding value in another column if the value in the range is greater than zero.

我认为像这样的简单替换将起作用:

I would think that a simple replace like this would work:

df = df.loc[:,'A':'D'].replace(1, df['column_with_value_I_want'])

但是,实际上,除了删除column_with_value_I_want以外,实际上我什么都没做,这完全是意料之外的,我不确定为什么会这样.

But that in fact does nothing as far as I can tell except drop the column_with_value_I_want, which is totally unintended, and I'm not sure why that happens.

这似乎也不起作用:

df[df.loc[:,'A':'D']] > 0 = df['column_with_value_I_want']

它返回错误:SyntaxError: can't assign to comparison.

这看起来应该很简单,但是尝试了几种不同的方法却无济于事.

This seems like it should be straightforward, but I'm at a loss after trying several different things to no avail.

我正在使用的数据框看起来像这样:

The dataframe I'm working with looks something like this:

df = pd.DataFrame({'A' : [1,0,0,1,0,0],
                   'B' : [1,0,0,1,0,1],
                   'C' : [1,0,0,1,0,1],
                   'D' : [1,0,0,1,0,0],
                   'column_with_value_I_want' : [22.0,15.0,90.0,10.,None,557.0],})

推荐答案

不知道如何在Pandas本身中执行此操作,但是如果您使用numpy则没有那么困难.

Not sure how to do it in Pandas per se, but it's not that difficult if you drop down to numpy.

如果您足够幸运,因此整个DataFrame都是数字形式的,则可以按照以下步骤操作:

If you're lucky enough so that your entire DataFrame is numerical, you can do so as follows:

import numpy as np

m = df.as_matrix()
>>> pd.DataFrame(
    np.where(np.logical_or(np.isnan(m), m > 0), np.tile(m[:, [4]], 5), m), 
    columns=df.columns)
    A   B   C   D   column_with_value_I_want
0   22  22  22  22  22
1   0   0   0   0   15
2   0   0   0   0   90
3   10  10  10  10  10
4   0   0   0   0   NaN
5   0   557     557     0   557


  • as_matrix将DataFrame转换为numpy array.
  • np.wherenumpy的三元条件.
  • np.logical_ornumpy的或.
  • np.isnan是检查值是否不是nan.
  • np.tile(在这种情况下)将2d单列平铺到矩阵.

    • as_matrix converts a DataFrame to a numpy array.
    • np.where is numpy's ternary conditional.
    • np.logical_or is numpy's or.
    • np.isnan is a check if a value is not nan.
    • np.tile tiles (in this case) a 2d single column to a matrix.
    • 不幸的是,如果您的某些列(即使是那些未参与此操作的列)本质上是非数字的,则上述操作将失败.在这种情况下,您可以执行以下操作:

      Unfortunately, the above will fail if some of your columns (even those not involved in this operation) are inherently non-numerical. In this case, you can do the following:

      for col in ['A', 'B', 'C', 'D']:
          df[col] = np.where(df[col] > 0, df[col], df.column_with_value_I_want)
      

      只要5个相关的列都是数字,就可以使用.

      which will work as long as the 5 relevant columns are numerical.

      这使用了一个循环(在数字Python中是不受欢迎的),但是至少它在列而不是行上使用了.假设您的数据长于宽,那么就可以了.

      This uses a loop (which is frowned upon in numerical Python), but at least it does so over columns, and not rows. Assuming your data is longer than wider, it should be OK.

      这篇关于如何根据条件将大 pandas 数据框中某个范围内的值替换为同一数据框中的另一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆