将100,000+条记录插入DocumentDB的最快方法 [英] Fastest way to insert 100,000+ records into DocumentDB

查看:93
本文介绍了将100,000+条记录插入DocumentDB的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如标题所示,我需要以编程方式将100,000+条记录插入DocumentDb集合中.该数据将在以后用于创建报告.我正在使用Azure Documents SDK和存储过程来批量插入文档(请参阅问题使用存储过程进行Azure documentdb批量插入).

As the title suggests, I need to insert 100,000+ records into a DocumentDb collection programatically. The data will be used for creating reports later on. I am using the Azure Documents SDK and a stored procedure for bulk inserting documents (See question Azure documentdb bulk insert using stored procedure).

以下控制台应用程序显示了我如何插入文档.

The following console application shows how I'm inserting documents.

InsertDocuments生成500个测试文档以传递到存储过程.主函数调用10次InsertDocuments,总共插入5,000个文档.运行此应用程序将导致每几秒钟插入500个文档.如果我增加每个呼叫的文档数量,我会开始出错并丢失文档.

InsertDocuments generates 500 test documents to pass to the stored procedure. The main function calls InsertDocuments 10 times, inserting 5,000 documents overall. Running this application results in 500 documents getting inserted every few seconds. If I increase the number of documents per call I start to get errors and lost documents.

任何人都可以推荐一种更快的插入文档的方法吗?

Can anyone recommend a faster way to insert documents?

static void Main(string[] args)
{
    Console.WriteLine("Starting...");

    MainAsync().Wait();
}

static async Task MainAsync()
{
    int campaignId = 1001,
        count = 500;

    for (int i = 0; i < 10; i++)
    {
        await InsertDocuments(campaignId, (count * i) + 1, (count * i) + count);
    }
}

static async Task InsertDocuments(int campaignId, int startId, int endId)
{
    using (DocumentClient client = new DocumentClient(new Uri(documentDbUrl), documentDbKey))
    {
        List<dynamic> items = new List<dynamic>();

        // Create x number of documents to insert
        for (int i = startId; i <= endId; i++)
        {
            var item = new
            {
                id = Guid.NewGuid(),
                campaignId = campaignId,
                userId = i,
                status = "Pending"
            };

            items.Add(item);
        }

        var task = client.ExecuteStoredProcedureAsync<dynamic>("/dbs/default/colls/campaignusers/sprocs/bulkImport", new RequestOptions()
        {
            PartitionKey = new PartitionKey(campaignId)
        },
        new
        {
            items = items
        });

        try
        {
            await task;

            int insertCount = (int)task.Result.Response;

            Console.WriteLine("{0} documents inserted...", insertCount);
        }
        catch (Exception e)
        {
            Console.WriteLine("Error: {0}", e.Message);
        }
    }
}

推荐答案

将文档插入Azure DocumentDB的最快方法.可在Github上作为示例使用: https://github .com/Azure/azure-documentdb-dotnet/tree/master/samples/documentdb-benchmark

The fastest way to insert documents into Azure DocumentDB. is available as a sample on Github: https://github.com/Azure/azure-documentdb-dotnet/tree/master/samples/documentdb-benchmark

以下技巧将帮助您使用.NET SDK获得最佳的througphput:

The following tips will help you achieve the best througphput using the .NET SDK:

  • 初始化单例DocumentClient
  • 使用直接连接和TCP协议(ConnectionMode.DirectConnectionProtocol.Tcp)
  • 并行使用100个任务(取决于您的硬件)
  • 将DocumentClient构造函数中的MaxConnectionLimit增加到一个很高的值,例如1000个连接
  • 打开gcServer
  • 确保您的集合具有适当的预配置吞吐量(和良好的分区键)
  • 在同一Azure区域中运行也会有所帮助
  • Initialize a singleton DocumentClient
  • Use Direct connectivity and TCP protocol (ConnectionMode.Direct and ConnectionProtocol.Tcp)
  • Use 100s of Tasks in parallel (depends on your hardware)
  • Increase the MaxConnectionLimit in the DocumentClient constructor to a high value, say 1000 connections
  • Turn gcServer on
  • Make sure your collection has the appropriate provisioned throughput (and a good partition key)
  • Running in the same Azure region will also help

以10,000 RU/s的速度,您可以在大约50秒内插入100,000个文档(每次写入大约5个请求单位).

With 10,000 RU/s, you can insert 100,000 documents in about 50 seconds (approximately 5 request units per write).

以100,000 RU/s的速度,您可以在大约5秒钟内插入.您可以通过配置吞吐量(并且对于非常高的插入数量,将插入分布在多个VM/工作人员中),以使其达到所需的速度

With 100,000 RU/s, you can insert in about 5 seconds. You can make this as fast as you want to, by configuring throughput (and for very high # of inserts, spread inserts across multiple VMs/workers)

编辑:您现在可以在 https://docs.microsoft.com/zh-cn/azure/cosmos-db/bulk-executor-overview ,19年7月12日

You can now use the bulk executor library at https://docs.microsoft.com/en-us/azure/cosmos-db/bulk-executor-overview, 7/12/19

这篇关于将100,000+条记录插入DocumentDB的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆