如何处理< UKN>文本生成中的标记 [英] How to handle <UKN> tokens in text generation

查看：146 发布时间：2020/5/4 10:29:15 machine-learning neural-network nlp word2vec recurrent-neural-network

本文介绍了如何处理< UKN>文本生成中的标记的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我的文本生成数据集中，正如大多数文本生成文献所建议的那样，我已将所有不常用词转换为令牌(未知词).

In my text generation dataset, I have converted all infrequent words into the token (unknown word), as suggested by most text-generation literature.

但是，当训练RNN将句子的一部分作为输入并预测句子的其余部分时，我不确定应该如何阻止网络生成令牌. 当网络在训练集中遇到未知(不频繁)的单词时，其输出应该是什么?

However, when training an RNN to take in part of a sentence as input and predict the rest of the sentence, I am not sure how I should stop the network from generating tokens. When the network encounters an unknown (infrequent) word in the training set, what should its output be?

示例:
句子:I went to the mall and bought a <ukn> and some groceries
网络输入:I went to the mall and bought a
当前网络输出:<unk> and some groceries
所需的网络输出:??? and some groceries

Example:
Sentence: I went to the mall and bought a <ukn> and some groceries
Network input: I went to the mall and bought a
Current network output: <unk> and some groceries
Desired network output: ??? and some groceries

应该输出什么而不是<unk>?

What should it be outputting instead of the <unk>?

我不想建立一个输出不知道的单词的生成器.

I don't want to build a generator that outputs words it does not know.

如何处理< UKN>文本生成中的标记 [英] How to handle <UKN> tokens in text generation

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何处理&lt; UKN&gt;文本生成中的标记 [英] How to handle &lt;UKN&gt; tokens in text generation

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

如何处理< UKN>文本生成中的标记 [英] How to handle <UKN> tokens in text generation

登录关闭