使用大 pandas 比较两列 [英] Compare two columns using pandas

查看:165
本文介绍了使用大 pandas 比较两列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以此作为起点:

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

Out[8]: 
  one  two three
0   10  1.2   4.2
1   15  70   0.03
2    8   5     0

如果 >大熊猫的声明。

I want to use something like an if statement within pandas.

if df['one'] >= df['two'] and df['one'] <= df['three']:
    df['que'] = df['one']

基本上,通过如果语句检查每一行,创建新列。

Basically, check each row via the if statement, create new column.

文档说要使用 .all 但没有示例...

The docs say to use .all but there is no example...

推荐答案

您可以使用 np.where 。如果 cond 是一个布尔数组,而 A B 是数组,然后

You could use np.where. If cond is a boolean array, and A and B are arrays, then

C = np.where(cond, A, B)

定义C等于 A 其中 cond 为True,而 B 其中 cond 为False。

defines C to be equal to A where cond is True, and B where cond is False.

import numpy as np
import pandas as pd

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
                     , df['one'], np.nan)

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03  NaN
2   8    5     0  NaN






如果你有不止一个条件,那么你cou请改用 np.select
例如,如果您希望 df ['que'] 等于 df ['two'] df ['one']< df ['two'] ,然后

conditions = [
    (df['one'] >= df['two']) & (df['one'] <= df['three']), 
    df['one'] < df['two']]

choices = [df['one'], df['two']]

df['que'] = np.select(conditions, choices, default=np.nan)

产生

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03   70
2   8    5     0  NaN

如果我们可以假设 df ['one']> = df ['two' ] df ['one']< df ['two']
False,那么条件和选择可以简化为

If we can assume that df['one'] >= df['two'] when df['one'] < df['two'] is False, then the conditions and choices could be simplified to

conditions = [
    df['one'] < df['two'],
    df['one'] <= df['three']]

choices = [df['two'], df['one']]

(假设可能不是真的如果 df ['one' ] df ['two'] 包含NaN。)

(The assumption may not be true if df['one'] or df['two'] contain NaNs.)

请注意,

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

使用字符串值定义DataFrame。因为它们是数字的,所以你可能会把这些字符串转换成浮点数:

defines a DataFrame with string values. Since they look numeric, you might be better off converting those strings to floats:

df2 = df.astype(float)

然而,这会更改结果,因为字符串逐个字符比较,而浮点数则以数字比较。 / p>

This changes the results, however, since strings compare character-by-character, while floats are compared numerically.

In [61]: '10' <= '4.2'
Out[61]: True

In [62]: 10 <= 4.2
Out[62]: False

这篇关于使用大 pandas 比较两列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆