选择sklearn管道对用户文本数据进行分类 [英] Choosing an sklearn pipeline for classifying user text data

查看：67 发布时间：2021/5/6 20:32:30 python machine-learning scikit-learn feature-selection

本文介绍了选择sklearn管道对用户文本数据进行分类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Python(使用sklearn模块)开发机器学习应用程序，并且目前正在尝试确定执行推理的模型.问题的简要说明:

I'm working on a machine learning application in Python (using the sklearn module), and am currently trying to decide on a model for performing inference. A brief description of the problem:

鉴于用户数据的许多实例，我试图根据相对关键字包含将它们分类为各种类别.它是受监督的，所以我有很多已经分类的预分类数据实例.(每条数据在2到12个左右的单词之间.)

Given many instances of user data, I'm trying to classify them into various categories based on relative keyword containment. It is supervised, so I have many, many instances of pre-classified data that are already categorized. (Each piece of data is between 2 and 12 or so words.)

我目前正在尝试在两个潜在模型之间做出决定:

I am currently trying to decide between two potential models:

CountVectorizer +多项朴素贝叶斯.使用sklearn的CountVectorizer获得整个训练数据中的关键字计数.然后，使用朴素贝叶斯(Naive Bayes)通过sklearn的MultinomialNB模型对数据进行分类.

CountVectorizer + Multinomial Naive Bayes. Use sklearn's CountVectorizer to obtain keyword counts across the training data. Then, use Naive Bayes to classify data using sklearn's MultinomialNB model.

对关键字计数+标准朴素贝叶斯(Naive Bayes)使用tf-idf术语加权.使用CountVectorizer获得训练数据的关键字计数矩阵，使用sklearn的TfidfTransformer将数据转换为tf-idf加权，然后将其转储到标准的朴素贝叶斯模型中.

Use tf-idf term weighting on keyword counts + standard Naive Bayes. Obtain a keyword count matrix for the training data using CountVectorizer, transform that data to be tf-idf weighted using sklearn's TfidfTransformer, and then dump that into a standard Naive Bayes model.

我已经阅读了两种方法中使用的类的文档，并且似乎都很好地解决了我的问题.

I've read through the documentation for the classes use in both methods, and both seem to address my problem very well.

对于这种类型的问题，为什么使用标准朴素贝叶斯模型进行tf-idf加权可能胜过多项式朴素贝叶斯?两种方法都存在明显的问题吗?

选择sklearn管道对用户文本数据进行分类 [英] Choosing an sklearn pipeline for classifying user text data

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

选择sklearn管道对用户文本数据进行分类 [英] Choosing an sklearn pipeline for classifying user text data

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭