如何将模式发现期间使用的样​​本大小增加到“无限”? [英] how to increase the sample size used during schema discovery to 'unlimited'?

查看:73
本文介绍了如何将模式发现期间使用的样​​本大小增加到“无限”?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用SDP时遇到了一些错误,其中一种潜在的解决方法是将模式发现期间使用的样​​本大小增加到无限。

I have encountered some errors with the SDP where one of the potential fixes is to increase the sample size used during schema discovery to 'unlimited'.

有关更多信息有关这些错误的信息,请参见:

For more information on these errors, see:

  • No matched schema for {"_id":"...","doc":{...}
  • The value type for json field XXXX was presented as YYYY but the discovered data type of the table's column was ZZZZ
  • XXXX does not exist in the discovered schema. Document has not been imported

问题:

如何设置样本大小?设置样本大小后,是否需要触发重新扫描?

How can I set the sample size? After I have set the sample size, do I need to trigger a rescan?

推荐答案

以下是您可以更改的步骤样本量。请注意,较大的样本量会增加算法的运行时间,并且仪表盘上除了工作处于触发状态一段时间外没有其他指示。

These are the steps you can follow to change the sample size. Beware that a larger sample size will increase the runtime for the algorithm and there is no indication in the dashboard other than the job remaining in 'triggered' state for a while.


  1. 验证特定负载已停止并且仪表板状态显示为已停止(有或没有错误)

  1. Verify the specific load has been stopped and the dashboard status shows it as stopped (with or without error)

查找文档 https://< account> .cloudant.com / _warehouser /< source> ,其中< source> 与您遇到问题的Cloudant数据库的名称匹配

Find a document https://<account>.cloudant.com/_warehouser/<source> where <source> matches the name of the Cloudant database you have issues with

注意:检查 https://< account> .cloudant .com / _warehouser / _all_docs 如果文档ID不明显

Note: Check https://<account>.cloudant.com/_warehouser/_all_docs if the document id is not obvious

替换 sample_size:null (用于扫描10,000个随机文档的样本), sample_size:-1 (用于扫描数据库中的所有文档)或 sample_size:X (扫描数据库中X为正整数的X个文档)

Substitute "sample_size": null (which scans a sample of 10,000 random documents) with "sample_size": -1 (to scan all documents in your database) or "sample_size": X (to scan X documents in your database where X is a positive integer)

保存文档并在仪表板上触发重新扫描。新的模式发现运行将使用定义的样本大小执行。

Save the document and trigger a rescan in the dashboard. A new schema discovery run will execute using the defined sample size.

这篇关于如何将模式发现期间使用的样​​本大小增加到“无限”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆