pandas 系列使用带有正则表达式键的字典替换 [英] pandas Series replace using dictionary with regex keys

查看:116
本文介绍了pandas 系列使用带有正则表达式键的字典替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有一个数据框定义为

Suppose there is a dataframe defined as

df = pd.DataFrame({'Col_1': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', '0'], 
                   'Col_2': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', '0']})

看起来像

   Col_1 Col_2
0      A     a
1      B     b
2      C     c
3      D     d
4      E     e
5      F     f
6      G     g
7      H     h
8      I     i
9      J     j
10     0     0

我想使用定义为

repl_dict = {re.compile('[ABH-LP-Z]'): 'DDD',
             re.compile('[CDEFG]'): 'BBB WTT',
             re.compile('[MNO]'): 'AAA WTT',
             re.compile('[0-9]'): 'CCC'}

我希望得到一个新的数据帧,其中 Col_1 应该如下

I would expect to get a new dataframe in which the Col_1 should have been as follows

      Col_1
0       DDD
1       DDD
2   BBB WTT
3   BBB WTT
4   BBB WTT
5   BBB WTT
6   BBB WTT
7       DDD
8       DDD
9       DDD
10      CCC

我只是简单地使用 df['Col_1'].replace(repl_dict, regex=True).但是,它并没有产生我所期望的.我得到的是这样的:

I just simply use df['Col_1'].replace(repl_dict, regex=True). However, it does not produce what I expected. What I've got is like:

                      Col_1
0     BBB WTTBBB WTTBBB WTT
1     BBB WTTBBB WTTBBB WTT
2                   BBB WTT
3                   BBB WTT
4                   BBB WTT
5                   BBB WTT
6                   BBB WTT
7     BBB WTTBBB WTTBBB WTT
8     BBB WTTBBB WTTBBB WTT
9     BBB WTTBBB WTTBBB WTT
10                      CCC

如果有人能让我知道为什么 df.replace() 对我不起作用以及替换多个值以获得预期输出的正确方法是什么,我将不胜感激.

I would appreciate it very much if anyone could let me know why the df.replace() was not working for me and what would be a correct way to replace multiple values to get the expected output.

推荐答案

使用锚点(^$,即):

Use anchors (^ and $, that is):

repl_dict = {re.compile('^[ABH-LP-Z]$'): 'DDD',
             re.compile('^[CDEFG]$'): 'BBB WTT',
             re.compile('^[MNO]$'): 'AAA WTT',
             re.compile('^[0-9]+$'): 'CCC'}

df['Col_1'].replace(repl_dict, regex=True) 产生:

0         DDD
1         DDD
2     BBB WTT
3     BBB WTT
4     BBB WTT
5     BBB WTT
6     BBB WTT
7         DDD
8         DDD
9         DDD
10        CCC

这篇关于pandas 系列使用带有正则表达式键的字典替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆