如何将模式发现期间使用的样本大小增加到“无限”? [英] how to increase the sample size used during schema discovery to 'unlimited'?
问题描述
我在使用SDP时遇到了一些错误,其中一种潜在的解决方法是将模式发现期间使用的样本大小增加到无限。
I have encountered some errors with the SDP where one of the potential fixes is to increase the sample size used during schema discovery to 'unlimited'.
有关更多信息有关这些错误的信息,请参见:
For more information on these errors, see:
- No matched schema for {"_id":"...","doc":{...}
- The value type for json field XXXX was presented as YYYY but the discovered data type of the table's column was ZZZZ
- XXXX does not exist in the discovered schema. Document has not been imported
问题:
如何设置样本大小?设置样本大小后,是否需要触发重新扫描?
How can I set the sample size? After I have set the sample size, do I need to trigger a rescan?
推荐答案
以下是您可以更改的步骤样本量。请注意,较大的样本量会增加算法的运行时间,并且仪表盘上除了工作处于触发状态一段时间外没有其他指示。
These are the steps you can follow to change the sample size. Beware that a larger sample size will increase the runtime for the algorithm and there is no indication in the dashboard other than the job remaining in 'triggered' state for a while.
-
验证特定负载已停止并且仪表板状态显示为已停止(有或没有错误)
Verify the specific load has been stopped and the dashboard status shows it as stopped (with or without error)
查找文档 https://< account> .cloudant.com / _warehouser /< source>
,其中< source>
与您遇到问题的Cloudant数据库的名称匹配
Find a document https://<account>.cloudant.com/_warehouser/<source>
where <source>
matches the name of the Cloudant database you have issues with
注意:检查 https://< account> .cloudant .com / _warehouser / _all_docs
如果文档ID不明显
Note: Check https://<account>.cloudant.com/_warehouser/_all_docs
if the document id is not obvious
替换 sample_size:null
(用于扫描10,000个随机文档的样本), sample_size:-1
(用于扫描数据库中的所有文档)或 sample_size:X
(扫描数据库中X为正整数的X个文档)
Substitute "sample_size": null
(which scans a sample of 10,000 random documents) with "sample_size": -1
(to scan all documents in your database) or "sample_size": X
(to scan X documents in your database where X is a positive integer)
保存文档并在仪表板上触发重新扫描。新的模式发现运行将使用定义的样本大小执行。
Save the document and trigger a rescan in the dashboard. A new schema discovery run will execute using the defined sample size.
这篇关于如何将模式发现期间使用的样本大小增加到“无限”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!