从列表中检查单词,并在pandas dataframe列中删除这些单词 [英] Check for words from list and remove those words in pandas dataframe column

查看:245
本文介绍了从列表中检查单词,并在pandas dataframe列中删除这些单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下列表,

remove_words = ['abc', 'deff', 'pls']

以下是我具有列名"string"的数据框

The following is the data frame which I am having with column name 'string'

     data['string']

0    abc stack overflow
1    abc123
2    deff comedy
3    definitely
4    pls lkjh
5    pls1234

我想从pandas数据框列的remove_words列表中检查单词,然后在pandas数据框中删除那些单词.我想检查单词是否单独出现而不与其他单词一起出现.

I want to check for words from remove_words list in the pandas dataframe column and remove those words in the pandas dataframe. I want to check for the words occurring individually without occurring with other words.

例如,如果pandas df列中存在"abc",则将其替换为",但如果它出现在abc123中,则需要保持原样.输出应该是

For example, if there is 'abc' in pandas df column, replace it with '' but if it occurs with abc123, we need to leave it as it is. The output here should be,

     data['string']

0    stack overflow
1    abc123
2    comedy
3    definitely
4    lkjh
5    pls1234

在我的实际数据中,remove_words列表中有2000个单词,而pandas数据框中有50亿条记录.因此,我正在寻找实现这一目标的最佳有效方法.

In my actual data, I have 2000 words in the remove_words list and 5 billion records in the pandas dataframe. So I am looking for the best efficient way to do this.

我在python中尝试了很少的事情,但没有成功.有人可以帮我吗?任何想法都会有所帮助.

I have tried few things in python, without much success. Can anybody help me in doing this? Any ideas would be helpful.

谢谢

推荐答案

尝试一下:

In [98]: pat = r'\b(?:{})\b'.format('|'.join(remove_words))

In [99]: pat
Out[99]: '\\b(?:abc|def|pls)\\b'

In [100]: df['new'] = df['string'].str.replace(pat, '')

In [101]: df
Out[101]:
               string              new
0  abc stack overflow   stack overflow
1              abc123           abc123
2          def comedy           comedy
3          definitely       definitely
4            pls lkjh             lkjh
5             pls1234          pls1234

这篇关于从列表中检查单词,并在pandas dataframe列中删除这些单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆