将 100,000 多条记录插入 DocumentDB 的最快方法 [英] Fastest way to insert 100,000+ records into DocumentDB

查看:22
本文介绍了将 100,000 多条记录插入 DocumentDB 的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如标题所暗示的,我需要以编程方式将 100,000 多条记录插入到 DocumentDb 集合中.这些数据将用于稍后创建报告.我正在使用 Azure 文档 SDK 和用于批量插入文档的存储过程(参见问题 使用存储过程的 Azure documentdb 批量插入).

As the title suggests, I need to insert 100,000+ records into a DocumentDb collection programatically. The data will be used for creating reports later on. I am using the Azure Documents SDK and a stored procedure for bulk inserting documents (See question Azure documentdb bulk insert using stored procedure).

以下控制台应用程序显示了我如何插入文档.

The following console application shows how I'm inserting documents.

InsertDocuments 生成 500 个测试文档以传递给存储过程.主函数调用 InsertDocuments 10 次,总共插入 5,000 个文档.运行此应用程序会导致每隔几秒插入 500 个文档.如果我增加每次调用的文档数量,我就会开始出现错误和丢失文档.

InsertDocuments generates 500 test documents to pass to the stored procedure. The main function calls InsertDocuments 10 times, inserting 5,000 documents overall. Running this application results in 500 documents getting inserted every few seconds. If I increase the number of documents per call I start to get errors and lost documents.

谁能推荐一种更快的插入文档的方法?

Can anyone recommend a faster way to insert documents?

static void Main(string[] args)
{
    Console.WriteLine("Starting...");

    MainAsync().Wait();
}

static async Task MainAsync()
{
    int campaignId = 1001,
        count = 500;

    for (int i = 0; i < 10; i++)
    {
        await InsertDocuments(campaignId, (count * i) + 1, (count * i) + count);
    }
}

static async Task InsertDocuments(int campaignId, int startId, int endId)
{
    using (DocumentClient client = new DocumentClient(new Uri(documentDbUrl), documentDbKey))
    {
        List<dynamic> items = new List<dynamic>();

        // Create x number of documents to insert
        for (int i = startId; i <= endId; i++)
        {
            var item = new
            {
                id = Guid.NewGuid(),
                campaignId = campaignId,
                userId = i,
                status = "Pending"
            };

            items.Add(item);
        }

        var task = client.ExecuteStoredProcedureAsync<dynamic>("/dbs/default/colls/campaignusers/sprocs/bulkImport", new RequestOptions()
        {
            PartitionKey = new PartitionKey(campaignId)
        },
        new
        {
            items = items
        });

        try
        {
            await task;

            int insertCount = (int)task.Result.Response;

            Console.WriteLine("{0} documents inserted...", insertCount);
        }
        catch (Exception e)
        {
            Console.WriteLine("Error: {0}", e.Message);
        }
    }
}

推荐答案

将文档插入 Azure DocumentDB 的最快方法.可作为 Github 上的示例:https://github.com/Azure/azure-documentdb-dotnet/tree/master/samples/documentdb-benchmark

The fastest way to insert documents into Azure DocumentDB. is available as a sample on Github: https://github.com/Azure/azure-documentdb-dotnet/tree/master/samples/documentdb-benchmark

以下提示将帮助您使用 .NET SDK 实现最佳吞吐量:

The following tips will help you achieve the best througphput using the .NET SDK:

  • 初始化单例 DocumentClient
  • 使用直接连接和 TCP 协议(​​ConnectionMode.DirectConnectionProtocol.Tcp)
  • 并行使用 100 个任务(取决于您的硬件)
  • 将 DocumentClient 构造函数中的 MaxConnectionLimit 增加到一个较高的值,比如 1000 个连接
  • 开启gcServer
  • 确保您的集合具有适当的预配置吞吐量(和良好的分区键)
  • 在同一 Azure 区域中运行也会有所帮助
  • Initialize a singleton DocumentClient
  • Use Direct connectivity and TCP protocol (ConnectionMode.Direct and ConnectionProtocol.Tcp)
  • Use 100s of Tasks in parallel (depends on your hardware)
  • Increase the MaxConnectionLimit in the DocumentClient constructor to a high value, say 1000 connections
  • Turn gcServer on
  • Make sure your collection has the appropriate provisioned throughput (and a good partition key)
  • Running in the same Azure region will also help

使用 10,000 RU/s,您可以在大约 50 秒内插入 100,000 个文档(每次写入大约 5 个请求单位).

With 10,000 RU/s, you can insert 100,000 documents in about 50 seconds (approximately 5 request units per write).

100,000 RU/s,您可以在大约 5 秒内插入.您可以通过配置吞吐量(对于非常高的插入次数,将插入分散到多个虚拟机/工作器中)来根据需要尽可能快地完成此操作.

With 100,000 RU/s, you can insert in about 5 seconds. You can make this as fast as you want to, by configuring throughput (and for very high # of inserts, spread inserts across multiple VMs/workers)

您现在可以在 https://docs.microsoft.com/en-us/azure/cosmos-db/bulk-executor-overview,7/12/19

You can now use the bulk executor library at https://docs.microsoft.com/en-us/azure/cosmos-db/bulk-executor-overview, 7/12/19

这篇关于将 100,000 多条记录插入 DocumentDB 的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆