相当于R caTools的Python随机'sample.split' [英] Python equivalent to R caTools random 'sample.split'

查看:126
本文介绍了相当于R caTools的Python随机'sample.split'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否存在与R等效的Python(也许是pandas)

Is there a Python (perhaps pandas) equivalent to R's

install.packages("caTools")
library(caTools)
set.seed(88)
split = sample.split(df$col, SplitRatio = 0.75)

将完全相同地生成 split?

我当前的上下文是,例如,获取与以下数据创建的R个数据帧(qualityTrainqualityTest)完全相对应的Pandas数据帧:

My current context for this is, as an example getting Pandas dataframes that correspond exactly to the R dataframes (qualityTrain, qualityTest) created by:

# https://courses.edx.org/c4x/MITx/15.071x/asset/quality.csv
quality = read.csv("quality.csv")
set.seed(88)
split = sample.split(quality$PoorCare, SplitRatio = 0.75)
qualityTrain = subset(quality, split == TRUE)
qualityTest = subset(quality, split == FALSE)

推荐答案

我认为scikit-learn的train_test_split函数可能对您有用(

I think scikit-learn's train_test_split function might work for you (link).

import pandas as pd
from sklearn.cross_validation import train_test_split

url = 'https://courses.edx.org/c4x/MITx/15.071x/asset/quality.csv'
quality = pd.read_csv(url)

train, test = train_test_split(quality, train_size=0.75, random_state=88)

qualityTrain = pd.DataFrame(train, columns=quality.columns)
qualityTest = pd.DataFrame(test, columns=quality.columns)

不幸的是,我没有得到与R函数相同的行.我猜这是种子,但可能是错误的.

Unfortunately I don't get the same rows as the R function. I'm guessing it's the seeding, but could be wrong.

这篇关于相当于R caTools的Python随机'sample.split'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆