FastText使用预训练的单词向量进行文本分类 [英] FastText using pre-trained word vector for text classification
问题描述
我正在研究文本分类问题,也就是说,给定一些文本,我需要为其分配某些给定的标签.
I am working on a text classification problem, that is, given some text, I need to assign to it certain given labels.
我尝试使用Facebook的快速文本库,该库具有我感兴趣的两个实用程序:
I have tried using fast-text library by Facebook, which has two utilities of interest to me:
A)具有预训练模型的单词向量
A) Word Vectors with pre-trained models
B)文本分类实用程序
B) Text Classification utilities
但是,这些似乎是完全独立的工具,因为我无法找到将这两个实用程序合并在一起的任何教程.
However, it seems that these are completely independent tools as I have been unable to find any tutorials that merge these two utilities.
我想要的是能够通过利用词向量的预训练模型来对一些文本进行分类.有什么办法吗?
What I want is to be able to classify some text, by taking advantage of the pre-trained models of the Word-Vectors. Is there any way to do this?
推荐答案
FastText的本机分类模式取决于您自己使用已知类的文本来训练单词向量.单词向量因此被优化以用于训练期间观察到的特定分类.因此,该模式通常不会与预训练向量一起使用.
FastText's native classification mode depends on you training the word-vectors yourself, using texts with known classes. The word-vectors thus become optimized to be useful for the specific classifications observed during training. So that mode typically wouldn't be used with pre-trained vectors.
如果使用预先训练的词向量,您将以某种方式自己将其组合为文本向量(例如,通过将文本的所有词平均在一起),然后训练一个单独的分类器(例如scikit-learn的许多选项)使用这些功能.
If using pre-trained word-vectors, you'd then somehow compose those into a text-vector yourself (for example, by averaging all the words of a text together), then training a separate classifier (such as one of the many options from scikit-learn) using those features.
这篇关于FastText使用预训练的单词向量进行文本分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!