加权词嵌入是什么意思? [英] What does a weighted word embedding mean?

查看:184
本文介绍了加权词嵌入是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我尝试实施的纸张中,

在这项工作中,使用三种类型的文本对推文进行了建模 表示.第一个是单词袋模型,权重为 tf-idf(期限频率 -反向文档频率)(部分 2.1.1).第二个表示通过平均所有单词(在句子中)的词嵌入来表示的句子,第三个表示一个 通过对所有单词的加权单词嵌入进行平均来得出句子, 单词的权重由tf-idf给出(部分 2.1.2).

In this work, tweets were modeled using three types of text representation. The first one is a bag-of-words model weighted by tf-idf (term frequency - inverse document frequency) (Section 2.1.1). The second represents a sentence by averaging the word embeddings of all words (in the sentence) and the third represents a sentence by averaging the weighted word embeddings of all words, the weight of a word is given by tf-idf (Section 2.1.2).

我不确定第三种表示形式是被提及的,因为使用单词权重的加权单词嵌入是由tf-idf给出的.我什至不确定它们是否可以一起使用.

I am not sure about the third representation which is mentioned as the weighted word embeddings which is using the weight of a word is given by tf-idf. I am not even sure if they can used together.

推荐答案

平均(可能加权)单词嵌入是有意义的,尽管取决于主要算法和训练数据,该句子表示可能不是最佳的.直觉如下:

Averaging (possibly weighted) of word embeddings makes sense, though depending on the main algorithm and the training data this sentence representation may not be the best. The intuition is the following:

  • 您可能要处理不同长度的句子,因此要求平均值(比普通和更好).
  • 句子中的某些单词通常比其他单词更有价值. TF-IDF是单词值的最简单度量.请注意,结果的大小不变.

另请参阅肯特等人的论文.有一个好帖子执行比较这两种方法在不同算法中的效果,得出的结论是,没有一种方法比另一种方法明显好:某些算法支持简单平均,有些算法在TF-IDF加权下表现更好.

See also this paper by Kenter et al. There is a nice post that performs the comparison of these two approaches in different algorithms, and concludes that none is significantly better than the other: some algorithms favor simple averaging, some algorithms perform better with TF-IDF weighting.

这篇关于加权词嵌入是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆