当按复杂索引和基于布尔值的条件设置子集时,如何为 Pandas 数据框赋值? [英] How to assign value to a pandas dataframe, when subset by complex index and boolean based conditions?

查看:17
本文介绍了当按复杂索引和基于布尔值的条件设置子集时,如何为 Pandas 数据框赋值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用复杂的子集模式替换熊猫数据框中的值.

I would like to replace values in a pandas dataframe, with a complex subsetting pattern.

使用 .loc 访问器时,我只能通过链接多个条件来进行子集化,因为某些条件是基于索引的.但似乎我无法在这样的子集链之后分配值.更新:另一个问题是由重复的索引引起的.我已经相应地更新了示例.

With the .loc accessor, I was only able to subset by chaining multiple conditions, because some of the conditions are index based. But it seems I can not assign values after such a chain of subsetting. UPDATE: A further problem is caused by the duplicated indicies. I have updated the example accordingly.

import numpy as np
import pandas as pd

df = pd.DataFrame({'a': ['foo'] * 10 + ['bar'] * 10, 'b': range(20)}, index=pd.date_range('2019-01-01','2019-01-10').append(pd.date_range('2019-01-01','2019-01-10')))

df.loc[df['a'] == 'foo', 'b'].loc[pd.to_datetime(['2019-01-05','2019-01-09'])] = np.nan

df

结果:

              a     b
2019-01-01  foo     0
2019-01-02  foo     1
2019-01-03  foo     2
2019-01-04  foo     3
2019-01-05  foo     4
2019-01-06  foo     5
2019-01-07  foo     6
2019-01-08  foo     7
2019-01-09  foo     8
2019-01-10  foo     9
2019-01-01  bar     10
2019-01-02  bar     11
2019-01-03  bar     12
2019-01-04  bar     13
2019-01-05  bar     14
2019-01-06  bar     15
2019-01-07  bar     16
2019-01-08  bar     17
2019-01-09  bar     18
2019-01-10  bar     19

预期:

              a     b
2019-01-01  foo     0
2019-01-02  foo     1
2019-01-03  foo     2
2019-01-04  foo     3
2019-01-05  foo     NaN
2019-01-06  foo     5
2019-01-07  foo     6
2019-01-08  foo     7
2019-01-09  foo     NaN
2019-01-10  foo     9
2019-01-01  bar     10
2019-01-02  bar     11
2019-01-03  bar     12
2019-01-04  bar     13
2019-01-05  bar     14
2019-01-06  bar     15
2019-01-07  bar     16
2019-01-08  bar     17
2019-01-09  bar     18
2019-01-10  bar     19

我尝试了替代方法,例如:

I have tried alternative approaches like:

df.loc[df['a'] == 'foo' and df.index.isin(['2019-01-05','2019-01-09']), 'b']

哪个掉落:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

即使这样也不行,因为 isin 返回一个没有基于日期的索引的数组:

Not even this works, as the isin returns an array without the date based indexing:

df['a'] == 'foo' and pd.Series(df.index.isin(['2019-01-05','2019-01-09']))

推荐答案

你可以用一个 .loc 链来做 loc 赋值将是不安全的

You can do with one .loc chain of loc assignment will be not safe

df.loc[df.index.isin(['2019-01-05','2019-01-09'])&df.a.eq('foo'),'b']=np.nan

这篇关于当按复杂索引和基于布尔值的条件设置子集时,如何为 Pandas 数据框赋值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆