将巨大的（95Mb）JSON数组拆分成较小的块？ [英] Split huge (95Mb) JSON array into smaller chunks?

查看：147 发布时间：2020/5/30 0:34:16 python json postgresql ubuntu chunks

本文介绍了将巨大的（95Mb）JSON数组拆分成较小的块？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我以JSON的形式从数据库中导出了一些数据，它实际上只是一个[列表]，其中有一束（900K）{objects}。

I exported some data from my database in the form of JSON, which is essentially just one [list] with a bunch (900K) of {objects} inside it.

现在尝试将其导入生产服务器中，但是我有一些便宜的Web服务器。当我用完所有资源10分钟后，他们不喜欢它。

Trying to import it on my production server now, but I've got some cheap web server. They don't like it when I eat all their resources for 10 minutes.

如何将文件拆分成较小的块，以便可以逐段导入？

How can I split this file into smaller chunks so that I can import it piece by piece?

编辑：实际上，它是PostgreSQL数据库。我愿意就如何导出所有数据块提出其他建议。我已经在服务器上安装了phpPgAdmin，据说可以接受CSV，Tabbed和XML格式。

Actually, it's a PostgreSQL database. I'm open to other suggestions on how I can export all the data in chunks. I've got phpPgAdmin installed on my server, which supposedly can accept CSV, Tabbed and XML formats.

修复phihag的脚本：

I had to fix phihag's script:

import json
with open('fixtures/PostalCodes.json','r') as infile:
  o = json.load(infile)
  chunkSize = 50000
  for i in xrange(0, len(o), chunkSize):
    with open('fixtures/postalcodes_' + ('%02d' % (i//chunkSize)) + '.json','w') as outfile:
      json.dump(o[i:i+chunkSize], outfile)

转储：

dump:

pg_dump -U username -t table database > filename

还原：

psql -U username < filename

（我不知道pg_restore到底做了什么，但是它给了我错误）

(I don't know what the heck pg_restore does, but it gives me errors)

关于此的教程可以方便地忽略此信息，尤其是。 -U 选项在大多数情况下可能是必需的。是的，手册页对此进行了解释，但是筛查50个您不关心的选项总是很痛苦的。

The tutorials on this conveniently leave this information out, esp. the -U option which is probably necessary in most circumstances. Yes, the man pages explain this, but it's always a pain to sift through 50 options you don't care about.

我最终还是接受了肯尼的建议...尽管那仍然是一个很大的痛苦。我不得不将表转储到文件中，进行压缩，上传，提取，然后尝试导入，但是生产中的数据略有不同，并且缺少一些外键（邮政编码附加在城市上）。当然，我不能只导入新城市，因为那样会引发重复的键错误，而不是默默地忽略它，这会很好。因此，我必须清空该表，对城市重复此过程，才意识到还有其他与城市相关的东西，所以我也必须清空该表。回到城市，最后我可以导入邮政编码。到现在为止，我已经淘汰了一半的数据库，因为一切都与一切联系在一起，并且我不得不重新创建所有条目。可爱。好东西，我还没有启动这个网站。同样，清空或截断表似乎并没有重置序列/自动增量，这是我想要的，因为有两个魔术条目我想拥有ID1。所以我必须删除或重置那些也是如此（我不知道怎么做），所以我手动编辑了这些PK的PK。

I ended up going with Kenny's suggestion... although it was still a major pain. I had to dump the table to a file, compress it, upload it, extract it, then I tried to import it, but the data was slightly different on production and there were some missing foreign keys (postalcodes are attached to cities). Of course, I couldn't just import the new cities, because then it throws a duplicate key error instead of silently ignoring it, which would have been nice. So I had to empty that table, repeat the process for cities, only to realize something else was tied to cities, so I had to empty that table too. Got the cities back in, then finally I could import my postal codes. By now I've obliterated half my database because everything is tied to everything and I've had to recreate all the entries. Lovely. Good thing I haven't launched the site yet. Also "emptying" or truncating a table doesn't seem to reset the sequences/autoincrements, which I'd like, because there are a couple magic entries I want to have ID 1. So..I'd have to delete or reset those too (I don't know how), so I manually edited the PKs for those back to 1.

我在phihag的解决方案中也会遇到类似的问题，再加上除非我编写了另一个导入脚本来匹配导出脚本，否则我一次只能导入17个文件。尽管他确实从字面上回答了我的问题，所以还是谢谢。

I would have ran into similar problems with phihag's solution, plus I would have had to import 17 files one at a time, unless I wrote another import script to match the export script. Although he did answer my question literally, so thanks.

将巨大的（95Mb）JSON数组拆分成较小的块？ [英] Split huge (95Mb) JSON array into smaller chunks?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将巨大的（95Mb）JSON数组拆分成较小的块？ [英] Split huge (95Mb) JSON array into smaller chunks?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭