如何从包含5000条记录的Excel文件插入documentDB? [英] How to insert into documentDB from Excel file containing 5000 records?

查看:784
本文介绍了如何从包含5000条记录的Excel文件插入documentDB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最初拥有约200行的Excel文件,我可以转换的Excel文件数据表,一切都得到了插入正确的documentdb。

I have an Excel file that originally had about 200 rows, and I was able to convert the excel file to a data table and everything got inserted into the documentdb correctly.

Excel文件,现在有5000行和30-40后未插入记录插入和所有行的其余未插入documentdb

The Excel file now has 5000 rows and it is not inserting after 30-40 records insertion and rest of all the rows are not inserted into the documentdb

我发现了一些例外情况如下。
Microsoft.Azure.Documents.DocumentClientException:异常:Microsoft.Azure.Documents.RequestRateTooLargeException,消息:{错误:请求速率大]}

I found some exception as below. Microsoft.Azure.Documents.DocumentClientException: Exception: Microsoft.Azure.Documents.RequestRateTooLargeException, message: {"Errors":["Request rate is large"]}

我的代码是:

    Service service = new Service();
    foreach(data in exceldata) //exceldata contains set of rows
    {
    var student = new Student();
    student.id= "";
    student.name = data.name;
    student.age = data.age;
    student.class = data.class;
    student.id = service.savetoDocumentDB(collectionLink,student); //collectionlink is a string stored in web.config
    students.add(student);
    }

Class Service
{
 public async Task<string> AddDocument(string collectionLink, Student data)
        {
            this.DeserializePayload(data);
            var result = await Client.CreateDocumentAsync(collectionLink, data);
            return result.Resource.Id;
        }
}



难道我做错了什么?
任何帮助将大大明显

Am I doing anything wrong? Any help would be greatly appreciable.

推荐答案

更新

截至15年4月8日,DocumentDB已经发布了一个数据导入工具,它支持JSON文件,MongoDB中,SQL Server和CSV文件。你可以在这里找到: http://www.microsoft.com /en-us/download/details.aspx?id=46436

As of 4/8/15, DocumentDB has released a data import tool, which supports JSON files, MongoDB, SQL Server, and CSV files. You can find it here: http://www.microsoft.com/en-us/download/details.aspx?id=46436

在这种情况下,您可以将Excel文件保存为CSV,然后舱壁。使用数据导入工具导入记录

In this case, you can save your Excel file as a CSV and then bulk-import records using the data import tool.

原来的答复:

DocumentDB集合置备每秒2000请求单位。需要注意的是很重要的 - 限制使用的请求单位来,而不是请求表达;所以写较大的文档的成本比小的文件多了,扫描比指数寻求更为昂贵。

DocumentDB Collections are provisioned 2,000 request-units per second. It's important to note - the limits are expressed in terms of request-units and not requests; so writing larger documents costs more than smaller documents, and scanning is more expensive than index seeks.

您可以通过检查<$测量任何操作(CRUD)的开销C $ C> X-MS-请求费 HTTP响应头或 ResourceResponse RequestCharge 属性C $ C> / FeedResponse 对象由SDK返回。

You can measure the overhead of any operations (CRUD) by inspecting the x-ms-request-charge HTTP response header or the RequestCharge property in the ResourceResponse/FeedResponse objects returned by the SDK.

当你用尽置备吞吐量RequestRateTooLargeException被抛出。一些解决方案包括:

A RequestRateTooLargeException is thrown when you exhaust the provisioned throughput. Some solutions include:


  • 返回了瓦特/短延时,每当你遇到异常重试。推荐的重试延迟是包含在 X-MS-重发后-MS HTTP响应报头。或者,你可以简单地用一个短暂的延迟

  • 批量请求使用延迟索引更快的摄食率。 DocumentDB允许你指定在集合级别索引策略。默认情况下,该指数是在每次写入到集合同步更新。这使得查询兑现了相同的一致性水平,该文件而不为赶上该指数的任何延迟读取。懒惰索引可用于缓冲在一个较长的时间周期需要入索引的内容的工作。要注意的是很重要的,但是,当被启用懒惰索引,查询结果将最终一致的,无论配置了DocumentDB帐户的一致性水平。

  • 如上所述,每个集合具有2000 RU的极限 - 你可以通过分片/增加吞吐量跨越多个集合和容量的单位分割数据

  • 删除空的集合,以利用所有配置的吞吐量 - 在DocumentDB帐户创建的每个文档集合是基于配置的容量单元(CUS)的数量分配预留的吞吐能力,并创造藏品的数量。单个CU使得2000提供单位要求(RUS)和最多3集支持。如果为CU只创建一个集合,整个CU可以通过将可用于集合。一旦第二集合被创建,第一收集的吞吐量将被减半并给予第二集合,并依此类推。为了最大限度地提高每收集可用的吞吐量,我建议的容量单位馆藏数量为1:1。

  • Back off w/ a short delay and retry whenever you encounter the exception. A recommended retry delay is included in the x-ms-retry-after-ms HTTP response header. Alternatively, you could simply batch requests with a short delay
  • Use lazy indexing for faster ingestion rate. DocumentDB allows you to specify indexing policies at the collection level. By default, the index is updated synchronously on each write to the collection. This enables the queries to honor the same consistency level as that of the document reads without any delay for the index to "catch up". Lazy indexing can be used to amortize the work required to index content over a longer period of time. It is important to note, however, that when lazy indexing is enabled, query results will be eventually consistent regardless of the consistency level configured for the DocumentDB account.
  • As mentioned, each collection has a limit of 2,000 RUs - you can increase throughput by sharding / partitioning your data across multiple collections and capacity units.
  • Delete empty collections to utilize all provisioned throughput - every document collection created in a DocumentDB account is allocated reserved throughput capacity based on the number of Capacity Units (CUs) provisioned, and the number of collections created. A single CU makes available 2,000 request units (RUs) and supports up to 3 collections. If only one collection is created for the CU, the entire CU throughput will be available for the collection. Once a second collection is created, the throughput of the first collection will be halved and given to the second collection, and so on. To maximize throughput available per collection, I'd recommend the number of capacity units to collections is 1:1.

参考文献:

  • DocumentDB Performance Tips: http://azure.microsoft.com/blog/2015/01/27/performance-tips-for-azure-documentdb-part-2/
  • DocumentDB Limits: http://azure.microsoft.com/en-us/documentation/articles/documentdb-limits/

这篇关于如何从包含5000条记录的Excel文件插入documentDB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆