使用Python对CSV文件进行训练测试拆分 [英] Train-test Split of a CSV file in Python
问题描述
我有一个.csv
文件,其中包含我的数据.我想做Logistic Regression
,Naive Bayes
和Decision Trees
.我已经知道如何实现这些.
I have a .csv
file that contains my data. I would like to do Logistic Regression
, Naive Bayes
and Decision Trees
. I already know how to implement these.
但是,我的老师希望我将.csv
文件中的数据拆分为80%
,并让我的算法预测其他20%
.我想知道如何以这种方式实际分割数据.
However, my teacher wants me to split the data in my .csv
file into 80%
and let my algorithms predict the other 20%
. I would like to know how to actually split the data in that way.
diabetes_df = pd.read_csv("diabetes.csv")
diabetes_df.head()
with open("diabetes.csv", "rb") as f:
data = f.read().split()
train_data = data[:80]
test_data = data[20:]
我试图像这样分割它(确保它不起作用).
I tried to split it like this (sure it isn't working).
推荐答案
工作流程
- 加载数据(请参阅如何使用Python读写CSV文件? )
- 预处理数据(例如过滤/创建新功能)
- 对火车测试(验证和开发集)进行分组
- Load the data (see How do I read and write CSV files with Python? )
- Preprocess the data (e.g. filtering / creating new features)
- Make the train-test (validation and dev-set) split
代码
Sklearns sklearn.model_selection.train_test_split
是你的意思正在寻找:
Code
Sklearns sklearn.model_selection.train_test_split
is what you are looking for:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=0)
这篇关于使用Python对CSV文件进行训练测试拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!