使用大 pandas 比较两列 [英] Compare two columns using pandas
问题描述
以此作为起点:
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
Out[8]:
one two three
0 10 1.2 4.2
1 15 70 0.03
2 8 5 0
如果 >大熊猫的声明。
I want to use something like an if
statement within pandas.
if df['one'] >= df['two'] and df['one'] <= df['three']:
df['que'] = df['one']
基本上,通过如果
语句检查每一行,创建新列。
Basically, check each row via the if
statement, create new column.
文档说要使用 .all
但没有示例...
The docs say to use .all
but there is no example...
推荐答案
您可以使用 np.where 。如果 cond
是一个布尔数组,而 A
和 B
是数组,然后
You could use np.where. If cond
is a boolean array, and A
and B
are arrays, then
C = np.where(cond, A, B)
定义C等于 A
其中 cond
为True,而 B
其中 cond
为False。
defines C to be equal to A
where cond
is True, and B
where cond
is False.
import numpy as np
import pandas as pd
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
, df['one'], np.nan)
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 NaN
2 8 5 0 NaN
如果你有不止一个条件,那么你cou请改用 np.select 。
例如,如果您希望 df ['que']
等于 df ['two']
当 df ['one']< df ['two']
,然后
conditions = [
(df['one'] >= df['two']) & (df['one'] <= df['three']),
df['one'] < df['two']]
choices = [df['one'], df['two']]
df['que'] = np.select(conditions, choices, default=np.nan)
产生
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 70
2 8 5 0 NaN
如果我们可以假设 df ['one']> = df ['two' ]
当 df ['one']< df ['two']
是
False,那么条件和选择可以简化为
If we can assume that df['one'] >= df['two']
when df['one'] < df['two']
is
False, then the conditions and choices could be simplified to
conditions = [
df['one'] < df['two'],
df['one'] <= df['three']]
choices = [df['two'], df['one']]
(假设可能不是真的如果 df ['one' ]
或 df ['two']
包含NaN。)
(The assumption may not be true if df['one']
or df['two']
contain NaNs.)
请注意,
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
使用字符串值定义DataFrame。因为它们是数字的,所以你可能会把这些字符串转换成浮点数:
defines a DataFrame with string values. Since they look numeric, you might be better off converting those strings to floats:
df2 = df.astype(float)
然而,这会更改结果,因为字符串逐个字符比较,而浮点数则以数字比较。 / p>
This changes the results, however, since strings compare character-by-character, while floats are compared numerically.
In [61]: '10' <= '4.2'
Out[61]: True
In [62]: 10 <= 4.2
Out[62]: False
这篇关于使用大 pandas 比较两列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!