尝试从Azure Cosmos DB导入许多文件时出错 [英] Error when trying to import many files from Azure Cosmos DB

查看:105
本文介绍了尝试从Azure Cosmos DB导入许多文件时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好

我需要将大量数据导入到azure ML Studio中,但是一旦我尝试导入大约100 000,就会出现错误:



DocumentDb库异常:DocumentDB客户端抛出异常   。 (错误1000)

I need to import a lot of data into azure ML Studio but as soon as I try to import about 100 000 it gives the error:

DocumentDb library exception: DocumentDB client threw an exception . (Error 1000)

我正在使用SQL代码在Azure Cosmos中的JSON文档中导入和链接关系数据D B。当我导入90000行(不一定是90000 JSON文档,因为每个JSON文档有多个条目)时它工作正常但是只要
我超过90 000就会出现上述错误。当我只导入而不进行关系数据之间的链接时,我可以导入尽可能多的行。只有在我尝试导入超过90000个关系数据行时才会发生错误。我不知道为什么这个
正在发生。我在Azure ML工作室中使用免费的工作区帐户。可能是问题吗?它与内存耗尽有关吗?任何帮助将不胜感激。

I am using SQL code to do the import and link relational data within the JSON documents in Azure Cosmos DB. When I import 90000 rows (not necessarily 90000 JSON documents since there is more than one entry per JSON document) it works fine but as soon as I go above 90 000 it gives the above error. When I just import without doing the links between the relational data I can import as many rows as I want. The error only happens when I try to import more than 90000 relational data rows. I have no idea why this is happening. I am using a free workspace account in Azure ML studio. Might that be the problem? Does it have something to do with memory running out? Any help would be greatly appreciated.

谢谢! 

推荐答案

 

Hi, 

我怀疑你遇到了一些空间问题:

I suspect you are running into some space issues:

https://docs.microsoft.com/en-us/azure/machine-learning/studio/faq

我可以使用多少数据进行培训?

Machine Learning Studio中的模块支持常见用例的高达10 GB密集数值数据的数据集。如果模块需要多个输入,则所有输入的总大小为10 GB。您还可以通过Hive查询对大型数据集进行采样,通过Azure SQL数据库查询对
进行采样,或者使用
学习计数
模块。

Modules in Machine Learning Studio support datasets of up to 10 GB of dense numerical data for common use cases. If a module takes more than one input, the total size for all inputs is 10 GB. You can also sample larger datasets via Hive queries, via Azure SQL Database queries, or by preprocessing with Learning with Counts modules before ingestion.

在功能标准化过程中,以下类型的数据可以扩展为更大的数据集,并且限制为小于10 GB:

The following types of data can expand to larger datasets during feature normalization and are limited to less than 10 GB:


  • 稀疏
  • 分类
  • 字符串
  • 二进制数据

以下模块仅限于小于10 GB的数据集:

The following modules are limited to datasets less than 10 GB:


  • 推荐模块
  • 合成少数民族过采样技术(SMOTE)模块
  • 脚本模块:R,Python,SQL
  • 输出数据大小可能大于输入数据大小的模块,例如Join或Feature Hashing
  • 当迭代次数非常大时,交叉验证,调整模型超参数,序数回归和一对一多类

对于数据集大于几GB的数据,将数据上传到Azure存储或Azure SQL数据库,或使用HDInsight而不是直接从本地文件上传。

For datasets that are larger than a few GBs, upload data to Azure Storage or Azure SQL Database, or use HDInsight rather than directly uploading from a local file.

但是,
Jaya


这篇关于尝试从Azure Cosmos DB导入许多文件时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆