相当于R caTools的Python随机'sample.split' [英] Python equivalent to R caTools random 'sample.split'
本文介绍了相当于R caTools的Python随机'sample.split'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
是否存在与R等效的Python(也许是pandas
)
Is there a Python (perhaps pandas
) equivalent to R's
install.packages("caTools")
library(caTools)
set.seed(88)
split = sample.split(df$col, SplitRatio = 0.75)
将完全相同地生成 值split
?
我当前的上下文是,例如,获取与以下数据创建的R个数据帧(qualityTrain
,qualityTest
)完全相对应的Pandas数据帧:
My current context for this is, as an example getting Pandas dataframes that correspond exactly to the R dataframes (qualityTrain
, qualityTest
) created by:
# https://courses.edx.org/c4x/MITx/15.071x/asset/quality.csv
quality = read.csv("quality.csv")
set.seed(88)
split = sample.split(quality$PoorCare, SplitRatio = 0.75)
qualityTrain = subset(quality, split == TRUE)
qualityTest = subset(quality, split == FALSE)
推荐答案
我认为scikit-learn的train_test_split
函数可能对您有用(
I think scikit-learn's train_test_split
function might work for you (link).
import pandas as pd
from sklearn.cross_validation import train_test_split
url = 'https://courses.edx.org/c4x/MITx/15.071x/asset/quality.csv'
quality = pd.read_csv(url)
train, test = train_test_split(quality, train_size=0.75, random_state=88)
qualityTrain = pd.DataFrame(train, columns=quality.columns)
qualityTest = pd.DataFrame(test, columns=quality.columns)
不幸的是,我没有得到与R函数相同的行.我猜这是种子,但可能是错误的.
Unfortunately I don't get the same rows as the R function. I'm guessing it's the seeding, but could be wrong.
这篇关于相当于R caTools的Python随机'sample.split'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文