通过从 pandas 数据框中检查来替换单词 [英] Replace words by checking from pandas dataframe
问题描述
我有一个如下数据框.
ID Word Synonyms
------------------------
1 drove drive
2 office downtown
3 everyday daily
4 day daily
5 work downtown
我正在阅读一个句子,并想用上面定义的同义词替换该句子中的单词.这是我的代码:
I'm reading a sentence and would like to replace words in that sentence with their synonyms as defined above. Here is my code:
import nltk
import pandas as pd
import string
sdf = pd.read_excel('C:\synonyms.xlsx')
sd = sdf.apply(lambda x: x.astype(str).str.lower())
words = 'i drove to office everyday in my car'
#######
def tokenize(text):
text = ''.join([ch for ch in text if ch not in string.punctuation])
tokens = nltk.word_tokenize(text)
synonym = synonyms(tokens)
return synonym
def synonyms(words):
for word in words:
if(sd[sd['Word'] == word].index.tolist()):
idx = sd[sd['Word'] == word].index.tolist()
word = sd.loc[idx]['Synonyms'].item()
else:
word
return word
print(tokenize(words))
上面的代码将输入句子标记化.我想实现以下输出:
The code above tokenizes the input sentence. I would like to achieve the following output:
在中:i drove to office everyday in my car
退出:i drive to downtown daily in my car
In: i drove to office everyday in my car
Out: i drive to downtown daily in my car
但是我得到的输出是
退出:car
如果我跳过synonyms
函数,则我的输出没有问题,并且分成了单个单词.我试图了解我在synonyms
函数中做错了什么.另外,请告知是否有更好的解决方案.
If I skip the synonyms
function, then my output has no issues and is split into individual words. I am trying to understand what I'm doing wrong in the synonyms
function. Also, please advise if there is a better solution to this problem.
推荐答案
我会利用Pandas/NumPy索引.由于您的同义词映射是多对一的,因此您可以使用Word
列重新编制索引.
I would take advantage of Pandas/NumPy indexing. Since your synonym mapping is many-to-one, you can re-index using the Word
column.
sd = sd.applymap(str.strip).applymap(str.lower).set_index('Word').Synonyms
print(sd)
Word
drove drive
office downtown
everyday daily
day daily
Name: Synonyms, dtype: object
然后,您可以轻松地将标记列表与它们各自的同义词对齐.
Then, you can easily align a list of tokens to their respective synonyms.
words = nltk.word_tokenize(u'i drove to office everyday in my car')
sentence = sd[words].reset_index()
print(sentence)
Word Synonyms
0 i NaN
1 drove drive
2 to NaN
3 office downtown
4 everyday daily
5 in NaN
6 my NaN
7 car NaN
现在,仍然可以使用Synonyms
中的令牌,并回退到Word
.这可以通过
Now, it remains to use the tokens from Synonyms
, falling back to Word
. This can be achieved with
sentence = sentence.Synonyms.fillna(sentence.Word)
print(sentence.values)
[u'i' 'drive' u'to' 'downtown' 'daily' u'in' u'my' u'car']
这篇关于通过从 pandas 数据框中检查来替换单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!