在DataFrame列中设置最大值 [英] Set maximum value in DataFrame column
本文介绍了在DataFrame列中设置最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在熊猫数据框中有以下数据点:
I have the follow data point in panda dataframe:
DateTime Data
2017-11-21 18:54:31 1
2017-11-22 02:26:48 2
2017-11-22 10:19:44 3
2017-11-22 15:11:28 6
2017-11-22 23:21:58 7
2017-11-28 14:28:28 28
2017-11-28 14:36:40 0
2017-11-28 14:59:48 1
我想应用一个函数将所有大于1的数据值转换为1: 有没有一种方法可以将以下两个lambda函数组合在一起(就像else语句一样)?
I want to apply a function to convert all Data values bigger than 1 to 1: Is there a way to combine the following two lambda functions in one (like a else statement)?
[(lambda x: x/x)(x) for x in df['Data'] if x > 0]
[(lambda x: x)(x) for x in df['Data'] if x <1 ]
希望得到最终结果:
DateTime Data
2017-11-21 18:54:31 1
2017-11-22 02:26:48 1
2017-11-22 10:19:44 1
2017-11-22 15:11:28 1
2017-11-22 23:21:58 1
2017-11-28 14:28:28 1
2017-11-28 14:36:40 0
2017-11-28 14:59:48 1
推荐答案
You can use clip_upper
:
df['Data'] = df['Data'].clip_upper(1)
或使用 ge
(>=
)表示布尔掩码,如果没有负值,则转换为int
:
Or use ge
(>=
) for boolean mask and convert to int
, if no negative values:
df['Data'] = df['Data'].ge(1).astype(int)
print (df)
DateTime Data
0 2017-11-21 18:54:31 1
1 2017-11-22 02:26:48 1
2 2017-11-22 10:19:44 1
3 2017-11-22 15:11:28 1
4 2017-11-22 23:21:58 1
5 2017-11-28 14:28:28 1
6 2017-11-28 14:36:40 0
7 2017-11-28 14:59:48 1
但是如果要使用列表理解功能(在更大的DataFrame中应该更慢):
But if want use list comprehension (it should be slowier in bigger DataFrame):
df['Data'] = [1 if x > 0 else x for x in df['Data']]
print (df)
DateTime Data
0 2017-11-21 18:54:31 1
1 2017-11-22 02:26:48 1
2 2017-11-22 10:19:44 1
3 2017-11-22 15:11:28 1
4 2017-11-22 23:21:58 1
5 2017-11-28 14:28:28 1
6 2017-11-28 14:36:40 0
7 2017-11-28 14:59:48 1
时间:
#[8000 rows x 5 columns]
df = pd.concat([df]*1000).reset_index(drop=True)
In [28]: %timeit df['Data2'] = df['Data'].clip_upper(1)
1000 loops, best of 3: 308 µs per loop
In [29]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
1000 loops, best of 3: 425 µs per loop
In [30]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
100 loops, best of 3: 3.02 ms per loop
#[800000 rows x 5 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
In [32]: %timeit df['Data2'] = df['Data'].clip_upper(1)
100 loops, best of 3: 9.32 ms per loop
In [33]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
100 loops, best of 3: 4.76 ms per loop
In [34]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
1 loop, best of 3: 274 ms per loop
这篇关于在DataFrame列中设置最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文