用 pandas 替换字符串中字符的所有但最后一次出现的字符 [英] Replace all but last occurrences of a character in a string with pandas

查看:122
本文介绍了用 pandas 替换字符串中字符的所有但最后一次出现的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Pandas删除字符串中除最后一个期间外的所有内容,如下所示:

using Pandas to remove all but last period in a string like so:

s = pd.Series(['1.234.5','123.5','2.345.6','678.9'])
counts = s.str.count('\.')
target = counts==2
target
0     True
1    False
2     True
3    False
dtype: bool

s = s[target].str.replace('\.','',1)
s
0    1234.5
2    2345.6
dtype: object

但是,我想要的输出是:

my desired output, however, is:

0    1234.5
1    123.5
2    2345.6
3    678.9
dtype: object

replace命令和mask目标似乎正在丢弃未替换的值,我看不出如何解决此问题.

The replace command along with the mask target seem to be dropping the unreplaced values and I can't see how to remedy this.

推荐答案

基于正则表达式的str.replace

这个带有str.replace的正则表达式模式应该很好.

Regex-based with str.replace

This regex pattern with str.replace should do nicely.

s.str.replace(r'\.(?=.*?\.)', '')

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

这个想法是,只要要替换的字符更多,就继续替换.这是使用的正则表达式的细分.

The idea is that, as long as there are more characters to replace, keep replacing. Here's a breakdown of the regular expression used.

\.     # '.'
(?=    # positive lookahead
.*?    # match anything
\.     # look for '.'
)


np.vectorize

一起玩

如果要使用count进行此操作,这并非不可能,但这是一个挑战.您可以使用np.vectorize使其更容易.首先,定义一个函数


Fun with np.vectorize

If you want to do this using count, it isn't impossible, but it is a challenge. You can make this easier with np.vectorize. First, define a function,

def foo(r, c):
    return r.replace('.', '', c)

矢量化它,

v = np.vectorize(foo)

现在,调用函数v,并传递s和要替换的计数.

Now, call the function v, passing s and the counts to replace.

pd.Series(v(s, s.str.count(r'\.') - 1))

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

请记住,这基本上是一个光荣的循环.

Keep in mind that this is basically a glorified loop.

vectorize等效的python是

The python equivalent of vectorize would be,

r = []
for x, y in zip(s, s.str.count(r'\.') - 1):
    r.append(x.replace('.', '', y))

pd.Series(r)

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

或者,使用列表推导:

pd.Series([x.replace('.', '', y) for x, y in zip(s, s.str.count(r'\.') - 1)])

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

这篇关于用 pandas 替换字符串中字符的所有但最后一次出现的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆