word2vec gensim多种语言 [英] word2vec gensim multiple languages

查看：316 发布时间：2020/5/18 1:02:16 python nlp artificial-intelligence word2vec gensim

本文介绍了word2vec gensim多种语言的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个问题已经完全困扰了我.我正在使用gensim训练Word2Vec模型.我提供了多种语言的数据，即英语和印地语.当我试图找到最接近人"的单词时，这就是我得到的:

This problem is going completely over my head. I am training a Word2Vec model using gensim. I have provided data in multiple languages i.e. English and Hindi. When I am trying to find the words closest to 'man', this is what I am getting:

model.wv.most_similar(positive = ['man'])
Out[14]: 
[('woman', 0.7380284070968628),
 ('lady', 0.6933152675628662),
 ('monk', 0.6662989258766174),
 ('guy', 0.6513140201568604),
 ('soldier', 0.6491742134094238),
 ('priest', 0.6440571546554565),
 ('farmer', 0.6366012692451477),
 ('sailor', 0.6297377943992615),
 ('knight', 0.6290514469146729),
 ('person', 0.6288090944290161)]
--------------------------------------------

问题是，这些都是英语单词.然后，我试图找出含义相同的印地语和英语单词

Problem is, these are all English words. Then I tried to find similarity between same meaning Hindi and English words,

model.similarity('man', 'आदमी')
__main__:1: DeprecationWarning: Call to deprecated `similarity` (Method will 
be removed in 4.0.0, use self.wv.similarity() instead).
Out[13]: 0.078265618974427215

此准确性应该比所有其他精度更高.我的印地语语料库是通过翻译英语而制成的.因此，这些词出现在相似的上下文中.因此，它们应该靠近.

This accuracy should have been better than all the other accuracies. The Hindi corpus I have has been made by translating the English one. Hence the words appear in similar contexts. Hence they should be close.

这就是我在这里所做的:

This is what I am doing here:

#Combining all the words together.
all_reviews=HindiWordsList + EnglishWordsList

#Training FastText model
cpu_count=multiprocessing.cpu_count()
model=Word2Vec(size=300,window=5,min_count=1,alpha=0.025,workers=cpu_count,max_vocab_size=None,negative=10)
model.build_vocab(all_reviews)
model.train(all_reviews,total_examples=model.corpus_count,epochs=model.iter)
model.save("word2vec_combined_50.bin")

word2vec gensim多种语言 [英] word2vec gensim multiple languages

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

word2vec gensim多种语言 [英] word2vec gensim multiple languages

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭