将巨大的(95Mb)JSON数组拆分成较小的块? [英] Split huge (95Mb) JSON array into smaller chunks?

查看:147
本文介绍了将巨大的(95Mb)JSON数组拆分成较小的块?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以JSON的形式从数据库中导出了一些数据,它实际上只是一个[列表],其中有一束(900K){objects}。

I exported some data from my database in the form of JSON, which is essentially just one [list] with a bunch (900K) of {objects} inside it.

现在尝试将其导入生产服务器中,但是我有一些便宜的Web服务器。当我用完所有资源10分钟后,他们不喜欢它。

Trying to import it on my production server now, but I've got some cheap web server. They don't like it when I eat all their resources for 10 minutes.

如何将文件拆分成较小的块,以便可以逐段导入?

How can I split this file into smaller chunks so that I can import it piece by piece?

编辑:实际上,它是PostgreSQL数据库。我愿意就如何导出所有数据块提出其他建议。我已经在服务器上安装了phpPgAdmin,据说可以接受CSV,Tabbed和XML格式。

Actually, it's a PostgreSQL database. I'm open to other suggestions on how I can export all the data in chunks. I've got phpPgAdmin installed on my server, which supposedly can accept CSV, Tabbed and XML formats.

修复phihag的脚本:

I had to fix phihag's script:

import json
with open('fixtures/PostalCodes.json','r') as infile:
  o = json.load(infile)
  chunkSize = 50000
  for i in xrange(0, len(o), chunkSize):
    with open('fixtures/postalcodes_' + ('%02d' % (i//chunkSize)) + '.json','w') as outfile:
      json.dump(o[i:i+chunkSize], outfile)






转储:


dump:

pg_dump -U username -t table database > filename

还原:

psql -U username < filename

(我不知道pg_restore到底做了什么,但是它给了我错误)

(I don't know what the heck pg_restore does, but it gives me errors)

关于此的教程可以方便地忽略此信息,尤其是。 -U 选项在大多数情况下可能是必需的。是的,手册页对此进行了解释,但是筛查50个您不关心的选项总是很痛苦的。

The tutorials on this conveniently leave this information out, esp. the -U option which is probably necessary in most circumstances. Yes, the man pages explain this, but it's always a pain to sift through 50 options you don't care about.

我最终还是接受了肯尼的建议...尽管那仍然是一个很大的痛苦。我不得不将表转储到文件中,进行压缩,上传,提取,然后尝试导入,但是生产中的数据略有不同,并且缺少一些外键(邮政编码附加在城市上)。当然,我不能只导入新城市,因为那样会引发重复的键错误,而不是默默地忽略它,这会很好。因此,我必须清空该表,对城市重复此过程,才意识到还有其他与城市相关的东西,所以我也必须清空该表。回到城市,最后我可以导入邮政编码。到现在为止,我已经淘汰了一半的数据库,因为一切都与一切联系在一起,并且我不得不重新创建所有条目。可爱。好东西,我还没有启动这个网站。同样,清空或截断表似乎并没有重置序列/自动增量,这是我想要的,因为有两个魔术条目我想拥有ID1。所以我必须删除或重置那些也是如此(我不知道怎么做),所以我手动编辑了这些PK的PK。

I ended up going with Kenny's suggestion... although it was still a major pain. I had to dump the table to a file, compress it, upload it, extract it, then I tried to import it, but the data was slightly different on production and there were some missing foreign keys (postalcodes are attached to cities). Of course, I couldn't just import the new cities, because then it throws a duplicate key error instead of silently ignoring it, which would have been nice. So I had to empty that table, repeat the process for cities, only to realize something else was tied to cities, so I had to empty that table too. Got the cities back in, then finally I could import my postal codes. By now I've obliterated half my database because everything is tied to everything and I've had to recreate all the entries. Lovely. Good thing I haven't launched the site yet. Also "emptying" or truncating a table doesn't seem to reset the sequences/autoincrements, which I'd like, because there are a couple magic entries I want to have ID 1. So..I'd have to delete or reset those too (I don't know how), so I manually edited the PKs for those back to 1.

我在phihag的解决方案中也会遇到类似的问题,再加上除非我编写了另一个导入脚本来匹配导出脚本,否则我一次只能导入17个文件。尽管他确实从字面上回答了我的问题,所以还是谢谢。

I would have ran into similar problems with phihag's solution, plus I would have had to import 17 files one at a time, unless I wrote another import script to match the export script. Although he did answer my question literally, so thanks.

推荐答案

假设您可以选择返回并再次导出数据。 ..:

Assuming you have the option to go back and export the data again...:

pg_dump-将PostgreSQL数据库提取到脚本文件或其他存档文件中。

pg_dump - extract a PostgreSQL database into a script file or other archive file.

pg_restore-从pg_dump创建的存档文件中恢复PostgreSQL数据库。

pg_restore - restore a PostgreSQL database from an archive file created by pg_dump.

如果没有用,了解输出将要做什么可能很有用,因此另一个建议可以达到目标。

If that's no use, it might be useful to know what you're going to be doing with the output so that another suggestion can hit the mark.

这篇关于将巨大的(95Mb)JSON数组拆分成较小的块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆