分解数据集以在Bigquery ML中进行训练和评估 [英] spliting dataset for training and evaluation in Bigquery ML

查看:34
本文介绍了分解数据集以在Bigquery ML中进行训练和评估的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

BigQuery ML是否会自动拆分数据集以进行训练和评估?还是我们必须通过Logistic回归BigQuery ML手动获得80%的数据集进行培训,10%的验证和10%的评估?如果两者都是肯定的,那么哪个更好?

Does the BigQuery ML automatically split the dataset for training and evaluation? Or do we have to get manually 80% datset for training, 10% for validation and 10% for evaluation with logistic Regression BigQuery ML? If both are affirmative, which of these would be better?

谢谢

推荐答案

是的,BigQuery ML将自动为验证过程拆分数据.手动拆分保持集以对模型从未见过的数据执行一些其他验证,这也是相当普遍的做法.

Yes, BigQuery ML will automatically split data for it's validation processes. It would also be fairly common practice for you to manually split a holdout set to perform some additional validation on data that the model has never seen.

您可以使用 DATA_SPLIT_METHOD 参数告诉BigQuery ML您希望如何拆分数据.默认拆分为 AUTO_SPLIT ,其定义如下:

You can use the DATA_SPLIT_METHOD argument to tell BigQuery ML how you want to split the data. The default split is AUTO_SPLIT which is defined as follows:

当输入数据中的行数少于500时,所有行均为用作训练数据.当行数在500至50,000之间时输入数据中,有20%的数据用作RANDOM中的评估数据分裂.当输入数据中的行数超过50,000时,仅其中10,000个用作RANDOM划分中的评估数据.

When there are fewer than 500 rows in the input data, all rows are used as training data. When there are between 500 and 50,000 rows in the input data, 20% of the data is used as evaluation data in a RANDOM split. When there are more than 50,000 rows in the input data, only 10,000 of them are used as evaluation data in a RANDOM split.

有关更多信息,我建议您阅读

For more information I would recommend reading over the official documentation.

这篇关于分解数据集以在Bigquery ML中进行训练和评估的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆