替换 pandas 数据框列中的特定值,否则将列转换为数字 [英] Replace specific value in pandas dataframe column, else convert column to numeric

查看:76
本文介绍了替换 pandas 数据框列中的特定值,否则将列转换为数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定以下熊猫数据框

+----+------------------+-------------------------------------+--------------------------------+
|    |   AgeAt_X        |   AgeAt_Y                           |   AgeAt_Z                      |
|----+------------------+-------------------------------------+--------------------------------+
|  0 |   Older than 100 |                      Older than 100 |                          74.13 |
|  1 |              nan |                                 nan |                          58.46 |
|  2 |              nan |                                 8.4 |                          54.15 |
|  3 |              nan |                                 nan |                          57.04 |
|  4 |              nan |                               57.04 |                            nan |
+----+------------------+-------------------------------------+--------------------------------+

如何用 nan

+----+------------------+-------------------------------------+--------------------------------+
|    |   AgeAt_X        |   AgeAt_Y                           |   AgeAt_Z                      |
|----+------------------+-------------------------------------+--------------------------------+
|  0 |              nan |                                 nan |                          74.13 |
|  1 |              nan |                                 nan |                          58.46 |
|  2 |              nan |                                 8.4 |                          54.15 |
|  3 |              nan |                                 nan |                          57.04 |
|  4 |              nan |                               57.04 |                            nan |
+----+------------------+-------------------------------------+--------------------------------+

注意事项

  • 从所需列中删除 Older than 100 字符串后,我将这些列转换为数字,以便对所述列执行计算.
  • 此数据框中还有其他列(我已从本示例中排除),它们不会转换为数字,因此必须一次完成一列转换为数字.
  • After removing the Older than 100 string from the desired columns, I convert the columns to numeric in order to perform calculations on said columns.
  • There are other columns in this dataframe (that I have excluded from this example), which will not be converted to numeric, so the conversion to numeric must be done one column at a time.

我的尝试

尝试 1

if df.isin('Older than 100'):
    df.loc[df['AgeAt_X']] = ''
else:
    df['AgeAt_X'] = pd.to_numeric(df["AgeAt_X"])

尝试 2

if df.loc[df['AgeAt_X']] == 'Older than 100r':
    df.loc[df['AgeAt_X']] = ''
elif df.loc[df['AgeAt_X']] == '':
    df['AgeAt_X'] = pd.to_numeric(df["AgeAt_X"])

尝试 3

df['AgeAt_X'] = ['' if ele == 'Older than 100' else df.loc[df['AgeAt_X']] for ele in df['AgeAt_X']]

尝试 1、2 和 3 返回以下错误:

Attempts 1, 2 and 3 return the following error:

KeyError: 'None of [0 NaN\n1 NaN\n2 NaN\n3 NaN\n4 NaN\n5 NaN\n6 NaN\n7 NaN\n8 NaN\n9 NaN\n10 NaN\n11 NaN\n12 NaN\n13 NaN\n14 NaN\n15 NaN\n16 NaN\n17 NaN\n18 NaN\n19 NaN\n20 NaN\n21 NaN\n22 NaN\n23 NaN\n24 NaN\n25 NaN\n26 NaN\n27 NaN\n29 NaN\n2NaN\n ..\n6332 NaN\n6333 NaN\n6334 NaN\n6335 NaN\n6336 NaN\n6337 NaN\n6338 NaN\n6339 NaN\n6340 NaN\n6341 NaN\n6342 NaN\n6336 NaN\n6336 NaN\n6336 NaN\n6336\n6347 NaN\n6348 NaN\n6349 NaN\n6350 NaN\n6351 NaN\n6352 NaN\n6353 NaN\n6354 NaN\n6355 NaN\n6356 NaN\n6357 NaN\n63656N NaN\n6358 NaN\n6358 NaN\n636N\n1长度:6362,dtype:float64]都在[index]'

尝试 4

df['AgeAt_X'] = df['AgeAt_X'].replace({'Older than 100': ''})

尝试 4 返回以下错误:

Attempt 4 returns the following error:

TypeError: 无法比较类型 'ndarray(dtype=float64)' 和 'str'

我也看了一些帖子.下面的两个实际上并没有替换该值而是创建一个从其他人派生的新列

I've also looked at a few posts. The two below do not actually replace the value but create a new column derived from others

替换 Pandas DataFrame 中的特定值

Pandas 替换 DataFrame 值

推荐答案

我们可以遍历每一列并检查句子是否存在.如果命中,我们将用 NaN 替换为 Series.str.replace 并在将其转换为数字后立即使用 Series.astype,在本例中为 float:

We can loop through each column and check if the sentence is present. If we get a hit, we replace the sentence with NaN with Series.str.replace and right after convert it to numeric with Series.astype, in this case float:

df.dtypes
AgeAt_X     object
AgeAt_Y     object
AgeAt_Z    float64
dtype: object

sent = 'Older than 100'

for col in df.columns:
    if sent in df[col].values:
        df[col] = df[col].str.replace(sent, 'NaN')
        df[col] = df[col].astype(float)

print(df)
   AgeAt_X  AgeAt_Y  AgeAt_Z
0      NaN      NaN    74.13
1      NaN      NaN    58.46
2      NaN     8.40    54.15
3      NaN      NaN    57.04
4      NaN    57.04      NaN

df.dtypes
AgeAt_X    float64
AgeAt_Y    float64
AgeAt_Z    float64
dtype: object

这篇关于替换 pandas 数据框列中的特定值,否则将列转换为数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆