如何在scikit中加载CSV数据并将其用于朴素贝叶斯分类 [英] how to Load CSV Data in scikit and using it for Naive Bayes Classification

查看:242
本文介绍了如何在scikit中加载CSV数据并将其用于朴素贝叶斯分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试加载自定义数据以在Scikit中执行NB分类.在将样本数据加载到Scikit然后执行NB时需要帮助.如何为目标加载分类值.

Trying to load custom data to perform NB Classification in Scikit. Need help in loading the sample data into Scikit and then perform NB. How to load categorical values for target.

对培训和测试使用相同的数据,或者仅对测试使用完整的数据.

Use the same data for Train and Test or use a complete set just for test.

Sl No,Member ID,Member Name,Location,DOB,Gender,Marital Status,Children,Ethnicity,Insurance Plan ID,Annual Income ($),Twitter User ID
1,70000001,Fly Dorami,New York,39786,M,Single,,Asian,2002,0,548900028
2,70000002,Bennie Ariana,Pennsylvania,6/24/1940,F,Single,,Caucasian,2002,66313,
3,70000003,Brad Farley,Pennsylvania,12001,F,Married,4,African American,2002,98444,
4,70000004,Daggoo Cece,Indiana,14032,F,Married,2,Hispanic,2001,41896,113481472.

推荐答案

以下内容将帮助您入门,您将需要熊猫和numpy.您可以将.csv加载到数据框中,然后将其输入到模型中.因此,您都需要根据您尝试分离的对象来定义目标(负数为0,正数为1,假设是二进制分类).

The following should get you started you will need pandas and numpy. You can load your .csv into a data frame and use that to input into the model. You all so need to define targets (0 for negatives and 1 for positives, assuming binary classification) depending on what you are trying to separate.

from sklearn.naive_bayes import GaussianNB
import pandas as pd
import numpy as np

# create data frame containing your data, each column can be accessed # by df['column   name']
df = pd.read_csv('/your/path/yourFile.csv')

target_names = np.array(['Positives','Negatives'])

# add columns to your data frame
df['is_train'] = np.random.uniform(0, 1, len(df)) <= 0.75
df['Type'] = pd.Factor(targets, target_names)
df['Targets'] = targets

# define training and test sets
train = df[df['is_train']==True]
test = df[df['is_train']==False]

trainTargets = np.array(train['Targets']).astype(int)
testTargets = np.array(test['Targets']).astype(int)

# columns you want to model
features = df.columns[0:7]

# call Gaussian Naive Bayesian class with default parameters
gnb = GaussianNB()

# train model
y_gnb = gnb.fit(train[features], trainTargets).predict(train[features])

这篇关于如何在scikit中加载CSV数据并将其用于朴素贝叶斯分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆