从 pandas 中排除列where() [英] Exclude columns from pandas where()
问题描述
我有以下熊猫df:
import pandas as pd
import numpy as np
pd_df = pd.DataFrame({'Qu1': ['apple', 'potato', 'cheese', 'banana', 'cheese', 'banana', 'cheese', 'potato', 'egg'],
'Qu2': ['sausage', 'banana', 'apple', 'apple', 'apple', np.nan, 'banana', 'banana', 'banana'],
'Qu3': ['apple', 'potato', 'sausage', 'cheese', 'cheese', 'potato', 'cheese', 'potato', 'egg']})
我只想在Qu1
和Qu2
的两列上实现where()
,并保留其余部分
原始stackoverflow问题
,所以我创建了pd1
I'd like to implement where()
on two columns only Qu1
and Qu2
and keep the rest
original stackoverflow question
, so I created pd1
pd1 = pd_df.where(pd_df.apply(lambda x: x.map(x.value_counts()))>=2,
"other")[['Qu1', 'Qu2']]
然后我将pd_df
,pd_df['Qu3']
的其余部分添加到pd1
Then I added a rest of pd_df
,pd_df['Qu3']
to pd1
pd1['Qu3'] = pd_df['Qu3']
pd_df = []
我的问题是:最初我想在df
的一部分上执行where()
并保持其余列不变,所以上面的代码对于大型数据集可能会很危险吗?我可以这样破坏原始数据吗?如果是,最好的方法是什么?
My question is : Originally I want to execute where()
on part of df
and keep rest of columns as is, so could the code above be dangerous for large dataset ? Can I harm the original data this way ? If yes what the best way to do it ?
非常感谢!
推荐答案
您可以显式地提取原始df的copy
,然后覆盖所选的df:
You could just explicitly take a copy
of the orig df and then overwrite on a selection of that df:
In [40]:
pd1 = pd_df.copy()
pd1[['Qu1', 'Qu2']] = pd1[['Qu1', 'Qu2']].where(pd_df.apply(lambda x: x.map(x.value_counts()))>=2,
"other")
pd1
Out[40]:
Qu1 Qu2 Qu3
0 other other apple
1 potato banana potato
2 cheese apple sausage
3 banana apple cheese
4 cheese apple cheese
5 banana other potato
6 cheese banana cheese
7 potato banana potato
8 other banana egg
所以这里的区别是我们只对df的一部分进行操作,而不是对整个df进行操作,然后选择感兴趣的cols
So the difference here is that we only operate on a section of the df, rather than the whole df and then select the cols of interest
更新
如果您只想覆盖这些列,则只需选择这些列即可:
If you want to just overwrite those cols then just select those:
In [48]:
pd_df[['Qu1', 'Qu2']] = pd_df[['Qu1', 'Qu2']].where(pd_df.apply(lambda x: x.map(x.value_counts()))>=2,
"other")
pd_df
Out[48]:
Qu1 Qu2 Qu3
0 other other apple
1 potato banana potato
2 cheese apple sausage
3 banana apple cheese
4 cheese apple cheese
5 banana other potato
6 cheese banana cheese
7 potato banana potato
8 other banana egg
这篇关于从 pandas 中排除列where()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!