从特征词到单词python(“反向"单词袋) [英] From featurers to words python ("reverse" bag of words)

查看:106
本文介绍了从特征词到单词python(“反向"单词袋)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用sklearn,我在Python中创建了一个具有200个功能的BOW,可以轻松提取这些功能.但是,我该如何扭转呢?也就是说,从具有200个0或1的向量到相应的单词?由于词汇是字典,因此没有顺序,因此我不确定功能列表中每个元素对应哪个词.另外,如果我200维向量中的第一个元素对应于字典中的第一个单词,那么我该如何通过索引从字典中提取一个单词?

Using sklearn I've created a BOW with 200 features in Python, which are easily extracted. But, how can I reverse it? That is, go from a vector with 200 0's or 1's to the corresponding words? Since the vocabulary is a dictionary, thus not ordered, I am not sure which word each element in the feature list corresponds to. Also, if the first element in my 200 dimensional vector corresponds to the first word in the dictionary, how do I then extract a word from the dictionary via index?

BOW是通过这种方式创建的

The BOW is created this way

vec = CountVectorizer(stop_words = sw, strip_accents="unicode", analyzer = "word", max_features = 200)
features = vec.fit_transform(data.loc[:,"description"]).todense()

因此,特征"是一个矩阵(n,200)矩阵(n是句子数).

thus "features" is a matrix (n,200) matrix (n being the number of sentence).

推荐答案

我不确定您要做什么,但似乎您只是想弄清楚哪一列代表哪个词.为此,有一个方便的get_feature_names参数.

I'm not totally sure what you're going for, but it seems like you're just trying to figure out which column represents which word. For this, there is the handy get_feature_names argument.

让我们看一下

要查看哪个列代表哪个单词,请使用get_feature_names:

To see what column represents which word use get_feature_names:

>>> vec.get_feature_names()
['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']

因此您的第一列是and,第二列是document,依此类推.为了提高可读性,您可以将其粘贴在数据框中:

So your first column is and, second is document, and so on. For readability, you can stick this in a dataframe:

>>> pd.DataFrame(features, columns = vec.get_feature_names())
   and  document  first  is  one  second  the  third  this
0    0         1      1   1    0       0    1      0     1
1    0         2      0   1    0       1    1      0     1
2    1         0      0   1    1       0    1      1     1
3    0         1      1   1    0       0    1      0     1

这篇关于从特征词到单词python(“反向"单词袋)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆