如何插入到documentDB从Excel文件包含5000记录? [英] How to insert into documentDB from Excel file containing 5000 records?

查看:408
本文介绍了如何插入到documentDB从Excel文件包含5000记录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个原来有大约200行的Excel文件,我能够将excel文件转换为数据表,并将一切都正确插入到documentdb中。



Excel文件现在有5000行,并且在30-40条记录插入后不插入,所有行的其余部分未插入到documentdb中



我发现了一些异常如下。


Microsoft.Azure.Documents.DocumentClientException:异常:
Microsoft.Azure.Documents.RequestRateTooLargeException,message:
{Errors:[Request rate is large]}


我的代码是:

 服务service = new Service(); 
foreach(exceldata中的数据)// exceldata包含行集合
{
var student = new Student();
student.id =;
student.name = data.name;
student.age = data.age;
student.class = data.class;
student.id = service.savetoDocumentDB(collectionLink,student); // collectionlink是存储在web.config中的字符串
students.add(student);
}

类服务
{
public async任务< string> AddDocument(string collectionLink,Student data)
{
this.DeserializePayload(data);
var result = await Client.CreateDocumentAsync(collectionLink,data);
返回result.Resource.Id;
}
}

我做错了什么?

解决方案

更新 b
$ b

截至4/8/15,DocumentDB发布了一个数据导入工具,它支持JSON文件,MongoDB,SQL Server和CSV文件。您可以在这里找到它: http://www.microsoft.com /en-us/download/details.aspx?id=46436



在这种情况下,您可以将Excel文件另存为CSV,

原始答案



每秒提供2,000个请求单位。重要的是要注意 - 限制以请求单位而不是请求表示;因此,编写较大的文档的成本超过较小的文档,并且扫描比索引寻求更昂贵。



您可以通过检查 ResourceResponse 中的 RequestCarge 属性中的<$ c> c $ c> x-ms-request-charge



当您耗尽预配置的吞吐量时,抛出RequestRateTooLargeException异常。一些解决方案包括:




  • 在遇到异常时立即退回并重试。建议的重试延迟包含在 x-ms-retry-after-ms HTTP响应标头中。

  • 使用延迟索引更快的提取速率。 DocumentDB允许您在集合级别指定索引策略。默认情况下,每次写入集合时都会同步更新索引。这使得查询能够满足与文档读取相同的一致性级别,而没有任何延迟以使索引赶上。延迟索引可用于摊销在较长时间内对内容进行索引所需的工作。然而,重要的是要注意,当启用延迟索引时,查询结果将最终一致,而不管为DocumentDB帐户配置的一致性级别。

  • 如上所述,每个集合都有一个

  • 删除空集合以利用所有设置的吞吐量 - 在DocumentDB帐户中创建的每个文档集合基于所提供的容量单元(CU)的数量以及创建的集合的数量来分配保留的吞吐量容量。单个CU提供2000个请求单元(RU),并且最多支持3个集合。如果仅为CU创建了一个集合,则整个CU吞吐量将可用于集合。一旦创建第二个集合,第一个集合的吞吐量将减半并提供给第二个集合,依此类推。为了最大化每个集合的吞吐量,我建议集合的容量单位数为1:1。



p>


I have an Excel file that originally had about 200 rows, and I was able to convert the excel file to a data table and everything got inserted into the documentdb correctly.

The Excel file now has 5000 rows and it is not inserting after 30-40 records insertion and rest of all the rows are not inserted into the documentdb

I found some exception as below.

Microsoft.Azure.Documents.DocumentClientException: Exception: Microsoft.Azure.Documents.RequestRateTooLargeException, message: {"Errors":["Request rate is large"]}

My code is :

    Service service = new Service();
    foreach(data in exceldata) //exceldata contains set of rows
    {
    var student = new Student();
    student.id= "";
    student.name = data.name;
    student.age = data.age;
    student.class = data.class;
    student.id = service.savetoDocumentDB(collectionLink,student); //collectionlink is a string stored in web.config
    students.add(student);
    }

Class Service
{
 public async Task<string> AddDocument(string collectionLink, Student data)
        {
            this.DeserializePayload(data);
            var result = await Client.CreateDocumentAsync(collectionLink, data);
            return result.Resource.Id;
        }
}

Am I doing anything wrong? Any help would be greatly appreciable.

解决方案

Update:

As of 4/8/15, DocumentDB has released a data import tool, which supports JSON files, MongoDB, SQL Server, and CSV files. You can find it here: http://www.microsoft.com/en-us/download/details.aspx?id=46436

In this case, you can save your Excel file as a CSV and then bulk-import records using the data import tool.

Original Answer:

DocumentDB Collections are provisioned 2,000 request-units per second. It's important to note - the limits are expressed in terms of request-units and not requests; so writing larger documents costs more than smaller documents, and scanning is more expensive than index seeks.

You can measure the overhead of any operations (CRUD) by inspecting the x-ms-request-charge HTTP response header or the RequestCharge property in the ResourceResponse/FeedResponse objects returned by the SDK.

A RequestRateTooLargeException is thrown when you exhaust the provisioned throughput. Some solutions include:

  • Back off w/ a short delay and retry whenever you encounter the exception. A recommended retry delay is included in the x-ms-retry-after-ms HTTP response header. Alternatively, you could simply batch requests with a short delay
  • Use lazy indexing for faster ingestion rate. DocumentDB allows you to specify indexing policies at the collection level. By default, the index is updated synchronously on each write to the collection. This enables the queries to honor the same consistency level as that of the document reads without any delay for the index to "catch up". Lazy indexing can be used to amortize the work required to index content over a longer period of time. It is important to note, however, that when lazy indexing is enabled, query results will be eventually consistent regardless of the consistency level configured for the DocumentDB account.
  • As mentioned, each collection has a limit of 2,000 RUs - you can increase throughput by sharding / partitioning your data across multiple collections and capacity units.
  • Delete empty collections to utilize all provisioned throughput - every document collection created in a DocumentDB account is allocated reserved throughput capacity based on the number of Capacity Units (CUs) provisioned, and the number of collections created. A single CU makes available 2,000 request units (RUs) and supports up to 3 collections. If only one collection is created for the CU, the entire CU throughput will be available for the collection. Once a second collection is created, the throughput of the first collection will be halved and given to the second collection, and so on. To maximize throughput available per collection, I'd recommend the number of capacity units to collections is 1:1.

References:

这篇关于如何插入到documentDB从Excel文件包含5000记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆