分解数据集以在Bigquery ML中进行训练和评估 [英] spliting dataset for training and evaluation in Bigquery ML
问题描述
BigQuery ML是否会自动拆分数据集以进行训练和评估?还是我们必须通过Logistic回归BigQuery ML手动获得80%的数据集进行培训,10%的验证和10%的评估?如果两者都是肯定的,那么哪个更好?
Does the BigQuery ML automatically split the dataset for training and evaluation? Or do we have to get manually 80% datset for training, 10% for validation and 10% for evaluation with logistic Regression BigQuery ML? If both are affirmative, which of these would be better?
谢谢
推荐答案
是的,BigQuery ML将自动为验证过程拆分数据.手动拆分保持集以对模型从未见过的数据执行一些其他验证,这也是相当普遍的做法.
Yes, BigQuery ML will automatically split data for it's validation processes. It would also be fairly common practice for you to manually split a holdout set to perform some additional validation on data that the model has never seen.
您可以使用 DATA_SPLIT_METHOD
参数告诉BigQuery ML您希望如何拆分数据.默认拆分为 AUTO_SPLIT
,其定义如下:
You can use the DATA_SPLIT_METHOD
argument to tell BigQuery ML how you want to split the data. The default split is AUTO_SPLIT
which is defined as follows:
当输入数据中的行数少于500时,所有行均为用作训练数据.当行数在500至50,000之间时输入数据中,有20%的数据用作RANDOM中的评估数据分裂.当输入数据中的行数超过50,000时,仅其中10,000个用作RANDOM划分中的评估数据.
When there are fewer than 500 rows in the input data, all rows are used as training data. When there are between 500 and 50,000 rows in the input data, 20% of the data is used as evaluation data in a RANDOM split. When there are more than 50,000 rows in the input data, only 10,000 of them are used as evaluation data in a RANDOM split.
For more information I would recommend reading over the official documentation.
这篇关于分解数据集以在Bigquery ML中进行训练和评估的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!