如何使用布尔掩码在 pandas DataFrame中用nan替换“任何字符串"? [英] How to replace 'any strings' with nan in pandas DataFrame using a boolean mask?

查看:205
本文介绍了如何使用布尔掩码在 pandas DataFrame中用nan替换“任何字符串"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个227x4的DataFrame,其中包含要清除的国家/地区名称和数字值(缠结?).

I have a 227x4 DataFrame with country names and numerical values to clean (wrangle ?).

这是DataFrame的抽象:

Here's an abstraction of the DataFrame:

import pandas as pd
import random
import string
import numpy as np
pdn = pd.DataFrame(["".join([random.choice(string.ascii_letters) for i in range(3)]) for j in range (6)], columns =['Country Name'])
measures = pd.DataFrame(np.random.random_integers(10,size=(6,2)), columns=['Measure1','Measure2'])
df = pdn.merge(measures, how= 'inner', left_index=True, right_index =True)

df.iloc[4,1] = 'str'
df.iloc[1,2] = 'stuff'
print(df)

  Country Name Measure1 Measure2
0          tua        6        3
1          MDK        3    stuff
2          RJU        7        2
3          WyB        7        8
4          Nnr      str        3
5          rVN        7        4

如何在所有列中都用np.nan替换字符串值而不触及国家/地区名称?

How do I replace string values with np.nan in all columns without touching the country names?

我尝试使用布尔掩码:

mask = df.loc[:,measures.columns].applymap(lambda x: isinstance(x, (int, float))).values
print(mask)

[[ True  True]
 [ True False]
 [ True  True]
 [ True  True]
 [False  True]
 [ True  True]]

# I thought the following would replace by default false with np.nan in place, but it didn't
df.loc[:,measures.columns].where(mask, inplace=True)
print(df)

  Country Name Measure1 Measure2
0          tua        6        3
1          MDK        3    stuff
2          RJU        7        2
3          WyB        7        8
4          Nnr      str        3
5          rVN        7        4


# this give a good output, unfortunately it's missing the country names
print(df.loc[:,measures.columns].where(mask))

  Measure1 Measure2
0        6        3
1        3      NaN
2        7        2
3        7        8
4      NaN        3
5        7        4

我看了几个与我的问题有关的问题( [1] [2] [3] [4] [6] [8] ),但找不到回答我担心的人.

I have looked at several questions related to mine ([1], [2], [3], [4], [5], [6], [7], [8]), but could not find one that answered my concern.

推荐答案

仅分配感兴趣的列:

cols = ['Measure1','Measure2']
mask = df[cols].applymap(lambda x: isinstance(x, (int, float)))

df[cols] = df[cols].where(mask)
print (df)
  Country Name Measure1 Measure2
0          uFv        7        8
1          vCr        5      NaN
2          qPp        2        6
3          QIC       10       10
4          Suy      NaN        8
5          eFS        6        4

一个元问题,在这里提出一个问题(包括研究)要花费我3个多小时是正常的吗?

A meta-question, Is it normal that it takes me more than 3 hours to formulate a question here (including research) ?

我认为是的,提出一个好问题真的很难.

In my opinion yes, create good question is really hard.

这篇关于如何使用布尔掩码在 pandas DataFrame中用nan替换“任何字符串"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆