用 pandas 替换字符串中字符的所有但最后一次出现的字符 [英] Replace all but last occurrences of a character in a string with pandas
问题描述
使用Pandas删除字符串中除最后一个期间外的所有内容,如下所示:
using Pandas to remove all but last period in a string like so:
s = pd.Series(['1.234.5','123.5','2.345.6','678.9'])
counts = s.str.count('\.')
target = counts==2
target
0 True
1 False
2 True
3 False
dtype: bool
s = s[target].str.replace('\.','',1)
s
0 1234.5
2 2345.6
dtype: object
但是,我想要的输出是:
my desired output, however, is:
0 1234.5
1 123.5
2 2345.6
3 678.9
dtype: object
replace命令和mask目标似乎正在丢弃未替换的值,我看不出如何解决此问题.
The replace command along with the mask target seem to be dropping the unreplaced values and I can't see how to remedy this.
推荐答案
基于正则表达式的str.replace
这个带有str.replace
的正则表达式模式应该很好.
Regex-based with str.replace
This regex pattern with str.replace
should do nicely.
s.str.replace(r'\.(?=.*?\.)', '')
0 1234.5
1 123.5
2 2345.6
3 678.9
dtype: object
这个想法是,只要要替换的字符更多,就继续替换.这是使用的正则表达式的细分.
The idea is that, as long as there are more characters to replace, keep replacing. Here's a breakdown of the regular expression used.
\. # '.'
(?= # positive lookahead
.*? # match anything
\. # look for '.'
)
与np.vectorize
一起玩
如果要使用count
进行此操作,这并非不可能,但这是一个挑战.您可以使用np.vectorize
使其更容易.首先,定义一个函数
Fun with np.vectorize
If you want to do this using count
, it isn't impossible, but it is a challenge. You can make this easier with np.vectorize
. First, define a function,
def foo(r, c):
return r.replace('.', '', c)
矢量化它,
v = np.vectorize(foo)
现在,调用函数v
,并传递s
和要替换的计数.
Now, call the function v
, passing s
and the counts to replace.
pd.Series(v(s, s.str.count(r'\.') - 1))
0 1234.5
1 123.5
2 2345.6
3 678.9
dtype: object
请记住,这基本上是一个光荣的循环.
Keep in mind that this is basically a glorified loop.
与vectorize
等效的python是
The python equivalent of vectorize
would be,
r = []
for x, y in zip(s, s.str.count(r'\.') - 1):
r.append(x.replace('.', '', y))
pd.Series(r)
0 1234.5
1 123.5
2 2345.6
3 678.9
dtype: object
或者,使用列表推导:
pd.Series([x.replace('.', '', y) for x, y in zip(s, s.str.count(r'\.') - 1)])
0 1234.5
1 123.5
2 2345.6
3 678.9
dtype: object
这篇关于用 pandas 替换字符串中字符的所有但最后一次出现的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!