使用 Scikit Learn SVM 为文本分类准备数据 [英] Prepare data for text classification using Scikit Learn SVM

查看：51 发布时间：2021/7/16 19:53:04 python svm scikit-learn

本文介绍了使用 Scikit Learn SVM 为文本分类准备数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试应用 Scikit 中的 SVM 学习对我收集的推文进行分类.因此，将有两个类别，将它们命名为 A 和 B.现在，我将所有推文归类到两个文本文件A.txt"和B.txt"中.但是，我不确定 Scikit Learn SVM 要求什么类型的数据输入.我有一个以标签(A 和 B)作为键的字典和一个特征字典(一元组)及其频率作为值.抱歉，我对机器学习真的很陌生，不确定我应该怎么做才能让 SVM 工作.我发现 SVM 使用 numpy.ndarray 作为其数据输入的类型.我需要根据我自己的数据创建一个吗?应该是这样的吗?

标签特征频率一本'书' 54B'电影' 32

感谢任何帮助.

解决方案

查看文本特征提取.

另请查看文本分类示例.>

这里还有一个教程:

http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

特别不要过多关注 SVM 模型(尤其不是 sklearn.svm.SVC，它对内核模型更有趣，因此不是文本分类):一个简单的感知器、LogisticRegression 或 Bernoulli朴素贝叶斯模型可能同样有效，同时训练速度要快得多.

I'm trying to apply SVM from Scikit learn to classify the tweets I collected. So, there will be two categories, name them A and B. For now, I have all the tweets categorized in two text file, 'A.txt' and 'B.txt'. However, I'm not sure what type of data inputs the Scikit Learn SVM is asking for. I have a dictionary with labels (A and B) as its keys and a dictionary of features (unigrams) and their frequencies as values. Sorry, I'm really new to machine learning and not sure what I should do to get the SVM work. And I found that SVM is using numpy.ndarray as the type of its data input. Do I need to create one based on my own data? Should it be something like this?

Labels    features    frequency
  A        'book'        54
  B       'movies'       32

Any help is appreciated.

解决方案

Have a look at the documentation on text feature extraction.

Also have a look at the text classification example.

There is also a tutorial here:

http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

In particular don't focus too much on SVM models (in particular not sklearn.svm.SVC that is more interesting for kernel models hence not text classification): a simple Perceptron, LogisticRegression or Bernoulli naive Bayes models might work as good while being much faster to train.

这篇关于使用 Scikit Learn SVM 为文本分类准备数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 Scikit Learn SVM 为文本分类准备数据 [英] Prepare data for text classification using Scikit Learn SVM

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 Scikit Learn SVM 为文本分类准备数据 [英] Prepare data for text classification using Scikit Learn SVM

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭