有没有办法将nltk功能集转换为scipy.sparse数组? [英] Is there a way to convert nltk featuresets into a scipy.sparse array?

查看：77 发布时间：2020/5/18 0:58:31 python nlp nltk scikits

本文介绍了有没有办法将nltk功能集转换为scipy.sparse数组?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用scikit.learn，它需要numpy/scipy数组作为输入. 在nltk中生成的特征集由单音和双频频率组成.我可以手动完成此操作，但这会很费力.因此，想知道是否有我忽略的解决方案.

I'm trying to use scikit.learn which needs numpy/scipy arrays for input. The featureset generated in nltk consists of unigram and bigram frequencies. I could do it manually, but that'll be a lot of effort. So wondering if there's a solution i've overlooked.

推荐答案

我不知道，但是请注意scikit-learn可以自己进行 n -gram频率计数.假设单词级 n -grams:

Not that I know of, but note that scikit-learn can do n-gram frequency counting itself. Assuming word-level n-grams:

from sklearn.feature_extraction.text import CountVectorizer, WordNGramAnalyzer
v = CountVectorizer(analyzer=WordNGramAnalyzer(min_n=1, max_n=2))
X = v.fit_transform(files)

其中，files是字符串或类似文件的对象的列表.之后，X是原始频率计数的稀疏矩阵.

where files is a list of strings or file-like objects. After this, X is a scipy.sparse matrix of raw frequency counts.

这篇关于有没有办法将nltk功能集转换为scipy.sparse数组?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有没有办法将nltk功能集转换为scipy.sparse数组? [英] Is there a way to convert nltk featuresets into a scipy.sparse array?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

有没有办法将nltk功能集转换为scipy.sparse数组? [英] Is there a way to convert nltk featuresets into a scipy.sparse array?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭