根据python中的列表替换列中的几个值 [英] Replacing few values in a column based on a list in python
本文介绍了根据python中的列表替换列中的几个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
here is one good explained topic on stackoverflow: Replacing few values in a pandas dataframe column with another value
示例为:
BrandName Specialty
A H
B I
ABC J
D K
AB L
解决方案是:
df['BrandName'] = df['BrandName'].replace(['ABC', 'AB'], 'A')
问题是我的数据帧有些不同,我连续有两个字符串:
The problem is my dataframe is a little bit different, I have two strings in a row:
BrandName Specialty
A H
B I
ABC B J
D K
AB L
所需的输出仍然是:
BrandName Specialty
A H
B I
A B J
D K
A L
我该如何实现?
推荐答案
使用regex=True
进行Subtring替换:
Use regex=True
for subtring replacement:
df['BrandName'] = df['BrandName'].replace(['ABC', 'AB'], 'A', regex=True)
print (df)
BrandName Specialty
0 A H
1 B I
2 A B J
3 D K
4 A L
另一种解决方案是必要的,如果需要避免其他子字符串中的替换值(例如未替换ABCD
),则需要使用正则表达式单词边界:
Another solution is necessary, if need to avoid replacement values in anaother substrings, like ABCD
is not replaced, then need regex words boundaries:
print (df)
BrandName Specialty
0 A ABCD H
1 B I
2 ABC B J
3 D K
4 AB L
L = [r"\b{}\b".format(x) for x in ['ABC', 'AB']]
df['BrandName1'] = df['BrandName'].replace(L, 'A', regex=True)
df['BrandName2'] = df['BrandName'].replace(['ABC', 'AB'], 'A', regex=True)
print (df)
BrandName Specialty BrandName1 BrandName2
0 A ABCD H A ABCD A AD
1 B I B B
2 ABC B J A B A B
3 D K D D
4 AB L A A
编辑(来自提问者):
要加快速度,可以在这里查看:加快Python 3中数百万个正则表达式的替换速度
To speed it up, you can have a look here: Speed up millions of regex replacements in Python 3
最好的方法是trie
方法:
def trie_regex_from_words(words):
trie = Trie()
for word in words:
trie.add(word)
return re.compile(r"\b" + trie.pattern() + r"\b", re.IGNORECASE)
union = trie_regex_from_words(strings)
df['BrandName'] = df['BrandName'].replace(union, 'A', regex=True)
这篇关于根据python中的列表替换列中的几个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文