预测句子中的缺失词 - 自然语言处理模型 [英] Predicting Missing Words in a sentence - Natural Language Processing Model
问题描述
我有下面的句子:
I want to ____ the car because it is cheap.
我想使用 NLP 模型预测丢失的单词.我应该使用什么 NLP 模型?谢谢.
I want to predict the missing word ,using an NLP model. What NLP model shall I use? Thanks.
推荐答案
TL;DR
试试这个:https://github.com/huggingface/pytorch-pretrained-BERT
首先你必须正确设置它
First you have to set it up, properly with
pip install -U pytorch-pretrained-bert
然后你可以使用 BERT 算法中的屏蔽语言模型",例如
Then you can use the "masked language model" from the BERT algorithm, e.g.
import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
# OPTIONAL: if you want to have more information on what's happening, activate the logger as follows
import logging
logging.basicConfig(level=logging.INFO)
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text = '[CLS] I want to [MASK] the car because it is cheap . [SEP]'
tokenized_text = tokenizer.tokenize(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Create the segments tensors.
segments_ids = [0] * len(tokenized_text)
# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
# Load pre-trained model (weights)
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.eval()
# Predict all tokens
with torch.no_grad():
predictions = model(tokens_tensor, segments_tensors)
predicted_index = torch.argmax(predictions[0, masked_index]).item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
print(predicted_token)
[输出]:
buy
长篇
要真正理解为什么需要 [CLS]
、[MASK]
和分段张量,请仔细阅读论文,https://arxiv.org/abs/1810.04805
In Long
To truly understand why you need the [CLS]
, [MASK]
and segment tensors, please do read the paper carefully, https://arxiv.org/abs/1810.04805
如果你很懒惰,可以阅读 Lilian Weng 的这篇精彩博文,https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html
And if you're lazy, you can read this nice blogpost from Lilian Weng, https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html
除了BERT,还有很多其他模型可以完成填空任务.请查看 pytorch-pretrained-BERT
存储库中的其他模型,但更重要的是深入研究语言建模"的任务,即根据历史预测下一个单词的任务.
Other than BERT, there are a lot of other models that can perform the task of filling in the blank. Do look at the other models in the pytorch-pretrained-BERT
repository, but more importantly dive deeper into the task of "Language Modeling", i.e. the task of predicting the next word given a history.
这篇关于预测句子中的缺失词 - 自然语言处理模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!