什么是词向量维 [英] what is word vector dimension

查看:174
本文介绍了什么是词向量维的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前是深度学习的业余爱好者,并且正在此站点上阅读有关word2vector的信息 https://www .kaggle.com/c/word2vec-nlp-tutorial/details/part-3-more-fun-with-word-vectors

I am currently an amateur in deep learning and was reading about word2vector on this site https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-3-more-fun-with-word-vectors

对于CBOW或skipgram模型,我可以看到单词向量的维数为300,词汇量为15000.我在较早的文章中读到的是,我们可以对向量中的单词进行硬编码.因此,我猜单词向量维数应该等于词汇量,或者以不同的方式提出问题,这个单词维数是什么以及如何使其形象化.您如何看待这个维度?

For either CBOW or skipgram model, I can see that the dimension of the word vectors is 300 and the vocabulary size is 15000. What I have read in the earlier post, is that we can one hard encode the words in vectors. So I guess the word vector dimension should be equal to the vocabulary size or to put the question in a different way, what is this word dimension and how to do visualize it. How do you take this dimension?

推荐答案

单词向量维"是您在训练文档中训练过的向量的维.从技术上讲,您可以选择任何尺寸,例如10、100、300甚至1000.行业标准是300-500,因为我们尝试了不同的尺寸(300、400、500,... 1000等),但没有注意到300-400后性能显着提高. (这还取决于您的训练数据.)听起来,更大的尺寸意味着更繁重的计算.但是,如果我们将维度设置得太低,则将没有太多矢量空间来捕获整个培训文档所包含的信息.

"Word Vector Dimension" is the dimension of the vector that you have trained with the training document. Technically you can choose any dimension, like 10, 100, 300, even 1000. Industry norm is 300-500 because we have experimented with different dimensions (300, 400, 500, ... 1000, etc.) but haven't noticed the significant performance improvement after 300-400. (This also depends on your training data.) As it sounds, more dimension means heavier computation. However, if we set the dimension too low, then there is not much vector space to capture the information that the entire training document contains.

如何可视化?

您不能轻易地可视化300维向量,并且可能可视化300维向量对您不太有用.我们可以做的就是将这些向量投影到二维空间,这是我们最熟悉并且容易理解的空间.

You can't easily visualize 300-dimensional vector and probably visualizing 300-d vectors isn't too useful to you. What we can do is to project those vectors to 2-d space, the space that we are most familiar with and that we can understand easily.

您的最后一条语句所以我猜单词向量维应该等于词汇量! 词汇量为171,476个单词(英语单词总数)!字向量维数(通常为300-500.您不想训练10亿维向量,对吗?)是您预先决定要训练数据的向量大小.我的视频(无耻的插件)将帮助您理解重要的词向量概念:最好的AI

Your last statement So I guess the word vector dimension should be equal to the vocabulary size is WRONG! Vocab size is 171,476 words (total # of words in English)! Word vector dimension (mostly 300-500. You don't want to train 1-billion-dimensional vectors, do you?) is the size of vector you decide in advance to train the data. My video (shameless plug) will help you to understand the important word vector concepts: AI with the Best

这篇关于什么是词向量维的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆