如何使用n-gram执行分类任务? [英] How to work with n-grams for classification tasks?

查看：95 发布时间：2021/4/22 19:09:02 python nlp classification n-gram language-model

本文介绍了如何使用n-gram执行分类任务?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我将使用 n-gram 在样本数据集上训练分类器.我搜索了相关内容，并在下面编写了代码.因为我是python的初学者，所以我有两个问题.

I'm going to train a classifier on a sample dataset using n-gram. I searched for related content and wrote the code below. As I'm a beginner in python, I have two questions.

1-为什么词典要具有真"结构(标有注释)?这与朴素贝叶斯分类器输入有关吗?

1- Why should the dictionary have this 'True' structure (marked with comment)? Is this related to Naive Bayes Classifier input?

2-您建议使用哪个分类器来执行此任务?

2- Which classifier do you recommend to do this task?

欢迎其他任何缩短代码的建议:).

Any other suggestion to shorten the code are welcome :).

from nltk.corpus import movie_reviews
from nltk.corpus import stopwords
from nltk import ngrams
from nltk.classify import NaiveBayesClassifier
import nltk.classify.util


stoplist = set(stopwords.words("english"))


def stopword_removal(words):
    useful_words = [word for word in words if word not in stoplist]
    return useful_words


def create_ngram_features(words, n):
    ngram_vocab = ngrams(words, n)
    my_dict = dict([(ng, True) for ng in ngram_vocab])  # HERE
    return my_dict


for n in [1,2]:
    positive_data = []
    for fileid in movie_reviews.fileids('pos'):
        words = stopword_removal(movie_reviews.words(fileid))
        positive_data.append((create_ngram_features(words, n), "positive"))
    print('\n\n---------- Positive Data Sample----------\n', positive_data[0])

    negative_data = []
    for fileid in movie_reviews.fileids('neg'):
        words = stopword_removal(movie_reviews.words(fileid))
        negative_data.append((create_ngram_features(words, n), "negative"))
    print('\n\n---------- Negative Data Sample ----------\n', negative_data[0])

    train_set = positive_data[:100] + negative_data[:100]
    test_set = positive_data[100:] + negative_data[100:]

    classifier = NaiveBayesClassifier.train(train_set)

    accuracy = nltk.classify.util.accuracy(classifier, test_set)
    print('\n', str(n)+'-gram accuracy:', accuracy)

如何使用n-gram执行分类任务? [英] How to work with n-grams for classification tasks?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用n-gram执行分类任务? [英] How to work with n-grams for classification tasks?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭