将分类数据从CSV加载到Scikit-Learn以进行机器学习 [英] Load classified data from CSV to Scikit-Learn for machine learning

查看：214 发布时间：2020/5/4 9:38:39 python csv machine-learning scikit-learn classification

本文介绍了将分类数据从CSV加载到Scikit-Learn以进行机器学习的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在学习Scikit-学习对推文进行一些分类.我有一列带有推文的csv，下一列是从0-11开始的类.我经历了来自Scikit-Learn网站的本教程，我认为我了解实际分类是如何完成的，但我认为我并不真正了解数据格式.在教程中，资料位于文件夹中的文件中，其中文件夹名称充当分类标签.

I'm learning Scikit-Learn to do some classifying for tweets. I have a csv with tweets on one column, and their class from 0-11 in next column. I went through this tutorial from Scikit-Learn site I think I understand how the actual classifying is done but I don't think I really understood the data format. In tutorial the material was in files in folders where folder names acted as a classification tag.

在我的情况下，我应该从csv文件加载该数据，显然我需要构造一个数据结构，该数据结构将手动输入到矢量化器和分类器中.我应该如何处理?我认为该教程在这方面有点模棱两可，因为数据加载是自动完成的，而我却对自定义数据的结构和加载一无所知.

In my case I should load that data from csv file and apparently I need to construct the datastructure which is feed to vectorizer and classifier manually. How I should approach this? I think the tutorial was a bit ambiguous in this respect since the data loading was done automagically and left me in dark concerning the structure and loading of custom data.

推荐答案

通常，您将使用 numpy.load ，甚至使用标准库将cvs加载到列表中.看起来像这样:

Normally you would use pandas.read_csv or if you don't want a pandas dependency numpy.load or even load the cvs to a list using the standard library. It would look like this:

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

df = pd.read_csv('example.csv', header=None, sep=',', 
                 names=['tweets', 'class'])   # columns names if no header
vect = TfidfVectorizer()
X = vect.fit_transform(df['tweets']) 
y = df['class']

一旦拥有X和y，就可以将它们提供给分类器.

Once you have your X and y you can feed them to a classifier.

这篇关于将分类数据从CSV加载到Scikit-Learn以进行机器学习的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将分类数据从CSV加载到Scikit-Learn以进行机器学习 [英] Load classified data from CSV to Scikit-Learn for machine learning

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

将分类数据从CSV加载到Scikit-Learn以进行机器学习 [英] Load classified data from CSV to Scikit-Learn for machine learning

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭