使用 Sidekiq、Resque 等导入 CSV 行块 [英] Importing chunks of CSV rows with Sidekiq, Resque, etc

查看：38 发布时间：2021/7/9 18:58:41 ruby-on-rails postgresql sidekiq resque

本文介绍了使用 Sidekiq、Resque 等导入 CSV 行块的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个将数据从 CSV 文件导入数据库表的导入器.为了避免将整个文件加载到内存中，我使用 Smarter CSV 来解析将文件分成 100 个块，一次加载一个块.

I'm writing an importer that imports data from a CSV file into a DB table. To avoid loading the whole file into memory, I'm using Smarter CSV to parse the file into chunks of 100 to load each chunk one at a time.

我会将每个 100 数据块传递给后台作业处理器(例如 Resque 或 Sidekiq)以批量导入这些行.

I'll be passing each chunk of 100 to a background job processor such as Resque or Sidekiq to import those rows in bulk.

将 100 行作为作业参数传递会产生大约 5000 个字符长的字符串.这是否会导致任何问题，尤其是后端存储(例如，Sidekiq 使用 Redis - Redis 是否允许存储该长度的键?).我不想一次导入一行，因为它会为 50,000 行文件创建 50,000 个作业.

Passing 100 rows as a job argument results in a string that's about ~5000 characters long. Does this cause any problems in general or particularly with the back-end store (e.g. Sidekiq uses Redis - does Redis allow storing a key of that length?). I don't want to import one row at a time because it creates 50,000 jobs for a 50,000 row file.

我想知道整体导入的进度，所以我计划让每个作业(100 块)更新一个数据库字段，并在完成后将计数增加 1(不确定更好的方法?).由于这些作业并行处理，因此两个作业尝试将同一字段更新 1 并相互覆盖是否有任何危险?还是 DB 写入锁定表，以便一次只能写入一个?

I want to know the progress of the overall import, so I planned to have each job (chunk of 100) update a DB field and increase the count by 1 when it's done (not sure of a better approach?). Since these jobs process in parrallel, is there any danger of two jobs trying to update the same field by 1 and overwriting each other? Or do DB writes lock the table so only one can write at a time?

谢谢！

使用 Sidekiq、Resque 等导入 CSV 行块 [英] Importing chunks of CSV rows with Sidekiq, Resque, etc

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 Sidekiq、Resque 等导入 CSV 行块 [英] Importing chunks of CSV rows with Sidekiq, Resque, etc

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭