使用大 pandas 比较两列 [英] Compare two columns using pandas

查看：165 发布时间：2017/3/25 23:47:46 python pandas if-statement dataframe

本文介绍了使用大 pandas 比较两列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以此作为起点：

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

Out[8]: 
  one  two three
0   10  1.2   4.2
1   15  70   0.03
2    8   5     0

如果 >大熊猫的声明。

I want to use something like an if statement within pandas.

if df['one'] >= df['two'] and df['one'] <= df['three']:
    df['que'] = df['one']

基本上，通过如果语句检查每一行，创建新列。

Basically, check each row via the if statement, create new column.

文档说要使用 .all 但没有示例...

The docs say to use .all but there is no example...

推荐答案

您可以使用 np.where 。如果 cond 是一个布尔数组，而 A 和 B 是数组，然后

You could use np.where. If cond is a boolean array, and A and B are arrays, then

C = np.where(cond, A, B)

定义C等于 A 其中 cond 为True，而 B 其中 cond 为False。

defines C to be equal to A where cond is True, and B where cond is False.

import numpy as np
import pandas as pd

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
                     , df['one'], np.nan)

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03  NaN
2   8    5     0  NaN

如果你有不止一个条件，那么你cou请改用 np.select 。
例如，如果您希望 df ['que'] 等于 df ['two'] 当 df ['one']< df ['two'] ，然后

conditions = [
    (df['one'] >= df['two']) & (df['one'] <= df['three']), 
    df['one'] < df['two']]

choices = [df['one'], df['two']]

df['que'] = np.select(conditions, choices, default=np.nan)

产生

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03   70
2   8    5     0  NaN

如果我们可以假设 df ['one']> = df ['two' ] 当 df ['one']< df ['two'] 是
False，那么条件和选择可以简化为

If we can assume that df['one'] >= df['two'] when df['one'] < df['two'] is False, then the conditions and choices could be simplified to

conditions = [
    df['one'] < df['two'],
    df['one'] <= df['three']]

choices = [df['two'], df['one']]

（假设可能不是真的如果 df ['one' ] 或 df ['two'] 包含NaN。）

(The assumption may not be true if df['one'] or df['two'] contain NaNs.)

请注意，

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

使用字符串值定义DataFrame。因为它们是数字的，所以你可能会把这些字符串转换成浮点数：

defines a DataFrame with string values. Since they look numeric, you might be better off converting those strings to floats:

df2 = df.astype(float)

然而，这会更改结果，因为字符串逐个字符比较，而浮点数则以数字比较。 / p>

This changes the results, however, since strings compare character-by-character, while floats are compared numerically.

In [61]: '10' <= '4.2'
Out[61]: True

In [62]: 10 <= 4.2
Out[62]: False

这篇关于使用大 pandas 比较两列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用大 pandas 比较两列 [英] Compare two columns using pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用大 pandas 比较两列 [英] Compare two columns using pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭