从特征词到单词python(“反向"单词袋) [英] From featurers to words python ("reverse" bag of words)
问题描述
使用sklearn,我在Python中创建了一个具有200个功能的BOW,可以轻松提取这些功能.但是,我该如何扭转呢?也就是说,从具有200个0或1的向量到相应的单词?由于词汇是字典,因此没有顺序,因此我不确定功能列表中每个元素对应哪个词.另外,如果我200维向量中的第一个元素对应于字典中的第一个单词,那么我该如何通过索引从字典中提取一个单词?
Using sklearn I've created a BOW with 200 features in Python, which are easily extracted. But, how can I reverse it? That is, go from a vector with 200 0's or 1's to the corresponding words? Since the vocabulary is a dictionary, thus not ordered, I am not sure which word each element in the feature list corresponds to. Also, if the first element in my 200 dimensional vector corresponds to the first word in the dictionary, how do I then extract a word from the dictionary via index?
BOW是通过这种方式创建的
The BOW is created this way
vec = CountVectorizer(stop_words = sw, strip_accents="unicode", analyzer = "word", max_features = 200)
features = vec.fit_transform(data.loc[:,"description"]).todense()
因此,特征"是一个矩阵(n,200)矩阵(n是句子数).
thus "features" is a matrix (n,200) matrix (n being the number of sentence).
推荐答案
我不确定您要做什么,但似乎您只是想弄清楚哪一列代表哪个词.为此,有一个方便的get_feature_names
参数.
I'm not totally sure what you're going for, but it seems like you're just trying to figure out which column represents which word. For this, there is the handy get_feature_names
argument.