如何将大 XML 文件 (~10GB) 导入 PostgreSQL [英] How to Import Big XML File (~10GB) into PostgreSQL

查看:74
本文介绍了如何将大 XML 文件 (~10GB) 导入 PostgreSQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约 10GB 的 XML 文件,我不知道文件的内容,但我想将其导入到我的数据库中以便于查看.

I have a XML file of about 10GB, I dont know the content of the file but I would like to import it into my database to make it easy to view.

如何将 xml 文件导入我的 PostgreSQL 数据库?(对于这么大的文件,这甚至可能吗?)

How can I import an xml file to my PostgreSQL database? (Is this even possible with such a large file?)

我希望你们能帮助我:)

I Hope you guys can help me out :)

推荐答案

  1. 将 XML 文件转换为 CSV 文件.此外,在转换时,将其拆分为 100Mb-1Gb 的部分,以便于批处理.

  1. Convert XML file into CSV file. Also, when converting, split it to 100Mb-1Gb parts for easier batching.

使用您在 CSV 文件中定义的列创建表.

Create the table with columns you defined in the CSV file.

使用 COPY 命令.这是我知道的上传大量数据的最快方法.顺便说一句,它也可以通过 Java 完成,使用 CopyManager 类.

Upload the file(s) into Postgres with COPY command. It is the fastest way to upload a large amount of data I know. BTW, it could be done from Java too, with CopyManager class.

根据您将执行的查询类型,您可能希望创建索引:

Depending on the kind of queries you will perform, you will like to create indexes:

  1. 这将是最耗时的部分.但是,您可以使用CREATE INDEX CONCURRENTLY.它将允许您在后台创建索引时处理您的表.

  1. It will be the most time-consuming part. However, you may use CREATE INDEX CONCURRENTLY. It will allow you to work with your table while the index is created in the background.

如果您重复导入过程并且已经创建了表和索引,请在发出 COPY 命令之前删除索引并稍后重新创建它们.它将为您节省很多时间.

If you repeat the import process and already have the table and indexes created, drop the indexes before issuing the COPY command and recreate them later. It will save you much time.

如果您仍然对查询速度或索引创建速度不满意,也许使用 Clickhouse 代替.但是,这取决于您执行的查询类型.

If you are still unhappy with the speed of your queries or the speed of indexes creation, maybe it will be a good idea to use Clickhouse instead. However, it depends on what kind of queries you perform.

这篇关于如何将大 XML 文件 (~10GB) 导入 PostgreSQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆