python textblob和文本分类 [英] python textblob and text classification

查看:418
本文介绍了python textblob和文本分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python和 textblob 构建文本分类模型,该脚本正在我的服务器上运行,将来的想法是用户将能够提交其文本并将其分类. 我正在从csv加载训练集:

I'm trying do build a text classification model with python and textblob, the script is runing on my server and in the future the idea is that users will be able to submit their text and it will be classified. i'm loading the training set from csv :

# -*- coding: utf-8 -*-
import sys
import codecs
sys.stdout = open('yyyyyyyyy.txt',"w");
from nltk.tokenize import word_tokenize
from textblob.classifiers import NaiveBayesClassifier
with open('file.csv', 'r', encoding='latin-1') as fp:
    cl = NaiveBayesClassifier(fp, format="csv")  

print(cl.classify("some text"))

csv大约有500行(字符串在10到100个字符之间),NaiveBayesclassifier需要大约2分钟的时间进行训练,然后才能对我的文本进行分类(不确定是否正常,是否需要那么多时间,也许是我的服务器速度很慢,只有512mb内存).

csv is about 500 lines long (with string between 10 and 100 chars), and NaiveBayesclassifier needs about 2 minutes for training and then be able to classify my text(not sure if is normal that it need so much time, maybe is my server slow with only 512mb ram).

csv行的示例:

"Oggi alla Camera con la Fondazione Italia-Usa abbiamo consegnato a 140 studenti laureati con 110 e 110 lode i diplomi del Master in Marketing Comunicazione e Made in Italy.",FI-PDL

我不清楚什么,而我无法在textblob文档上找到答案,是否有办法保存"我训练有素的分类器(这样可以节省很多时间),因为到目前为止我每次都运行脚本它将再次训练分类器. 我是文本分类和机器学习的新手,所以我很抱歉如果这是一个愚蠢的问题.

what is not clear to me, and i cant find an answer on textblob documentation, is if there is a way to 'save' my trained classifier (so save a lot of time), because by now everytime i run the script it will train again the classifier. I'm new to text classification and machine learing so my apologize if it is a dumb question.

谢谢.

推荐答案

好,发现我需要泡菜模块:)

Ok found that pickle module is what i need :)

培训:

# -*- coding: utf-8 -*-
import pickle
from nltk.tokenize import word_tokenize
from textblob.classifiers import NaiveBayesClassifier
with open('file.csv', 'r', encoding='latin-1') as fp:
    cl = NaiveBayesClassifier(fp, format="csv")  

object = cl
file = open('classifier.pickle','wb') 
pickle.dump(object,file)

提取:

import pickle
sys.stdout = open('demo.txt',"w");
from nltk.tokenize import word_tokenize
from textblob.classifiers import NaiveBayesClassifier
cl = pickle.load( open( "classifier.pickle", "rb" ) )
print(cl.classify("text to classify"))

这篇关于python textblob和文本分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆