在DataFrame列中设置最大值 [英] Set maximum value in DataFrame column

查看:375
本文介绍了在DataFrame列中设置最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在熊猫数据框中有以下数据点:

I have the follow data point in panda dataframe:

DateTime                Data
2017-11-21 18:54:31     1
2017-11-22 02:26:48     2
2017-11-22 10:19:44     3
2017-11-22 15:11:28     6
2017-11-22 23:21:58     7
2017-11-28 14:28:28    28
2017-11-28 14:36:40     0
2017-11-28 14:59:48     1

我想应用一个函数将所有大于1的数据值转换为1: 有没有一种方法可以将以下两个lambda函数组合在一起(就像else语句一样)?

I want to apply a function to convert all Data values bigger than 1 to 1: Is there a way to combine the following two lambda functions in one (like a else statement)?

[(lambda x: x/x)(x) for x in df['Data'] if x > 0]
[(lambda x: x)(x) for x in df['Data'] if x <1 ]

希望得到最终结果:

DateTime                Data
2017-11-21 18:54:31     1
2017-11-22 02:26:48     1
2017-11-22 10:19:44     1
2017-11-22 15:11:28     1
2017-11-22 23:21:58     1
2017-11-28 14:28:28     1
2017-11-28 14:36:40     0
2017-11-28 14:59:48     1

推荐答案

您可以使用

You can use clip_upper:

df['Data'] = df['Data'].clip_upper(1)

或使用 ge (>=)表示布尔掩码,如果没有负值,则转换为int:

Or use ge (>=) for boolean mask and convert to int, if no negative values:

df['Data'] = df['Data'].ge(1).astype(int)

print (df)
              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

但是如果要使用列表理解功能(在更大的DataFrame中应该更慢):

But if want use list comprehension (it should be slowier in bigger DataFrame):

df['Data'] = [1 if x > 0 else x for x in df['Data']]
print (df)
              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

时间:

#[8000 rows x 5 columns]
df = pd.concat([df]*1000).reset_index(drop=True)

In [28]: %timeit df['Data2'] = df['Data'].clip_upper(1)
1000 loops, best of 3: 308 µs per loop

In [29]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
1000 loops, best of 3: 425 µs per loop

In [30]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
100 loops, best of 3: 3.02 ms per loop

#[800000 rows x 5 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [32]: %timeit df['Data2'] = df['Data'].clip_upper(1)
100 loops, best of 3: 9.32 ms per loop

In [33]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
100 loops, best of 3: 4.76 ms per loop

In [34]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
1 loop, best of 3: 274 ms per loop

这篇关于在DataFrame列中设置最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆