Postgres转储仅一部分表以获取开发快照 [英] Postgres dump of only parts of tables for a dev snapshot

查看:78
本文介绍了Postgres转储仅一部分表以获取开发快照的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在生产中,我们的数据库大小为数百GB。为了进行开发和测试,我们需要创建此数据库的快照,这些快照在功能上等效,但大小仅为10或20个演出。

On production our database is a few hundred gigabytes in size. For development and testing, we need to create snapshots of this database that are functionally equivalent, but which are only 10 or 20 gigs in size.

挑战在于我们业务实体的数据分散在许多表格中。我们要创建某种过滤后的快照,以便转储中仅包含 some 个实体。这样,我们每个月左右就可以获取用于开发和测试的新快照。

The challenge is that the data for our business entities are scattered across many tables. We want to create some sort of filtered snapshot so that only some of the entities are included in the dump. That way we can get fresh snapshots every month or so for dev and testing.

例如,假设我们有具有以下多对多关系的实体:

For example, let's say we have entities that have these many-to-many relationships:


  • 公司有N个部门

  • 部门有N个员工

  • 员工有N个出勤记录

可能有1000家公司,2500个部门,175000名员工以及数千万的出勤记录。我们想要一种可复制的方式来拉动前100家公司及其所有部门,雇员和出勤记录

There are maybe 1000 companies, 2500 divisions, 175000 employees, and tens of millions of attendance records. We want a replicable way to pull, say, the first 100 companies and all of its constituent divisions, employees, and attendance records.

我们当前将pg_dump用于架构,然后使用--disable-triggers和--data-only运行pg_dump,以从较小的表中获取所有数据。我们不想编写自定义脚本来提取部分数据,因为我们的开发周期很快,并且担心自定义脚本会很脆弱,并且可能会过时。

We currently use pg_dump for the schema, and then run pg_dump with --disable-triggers and --data-only to get all the data out of the smaller tables. We don't want to have to write custom scripts to pull out part of the data because we have a fast development cycle and are concerned the custom scripts would be fragile and likely to be out of date.

我们该怎么做?是否有第三方工具可以帮助从数据库中提取逻辑分区?这些工具叫什么工具?

How can we do this? Are there third-party tools that can help pull out logical partitions from the database? What are these tools called?

任何一般性建议也值得赞赏!

Any general advice also appreciated!

推荐答案

在较大的表上,可以使用COPY命令提取子集...

On your larger tables you can use the COPY command to pull out subsets...

COPY (SELECT * FROM mytable WHERE ...) TO '/tmp/myfile.tsv'

COPY mytable FROM 'myfile.tsv'

https://www.postgresql.org /docs/current/static/sql-copy.html

您应该考虑维护一组开发数据,而不仅仅是提取一部分产品。如果您正在编写单元测试,则可以使用测试所需的相同数据,以尝试找到所有可能的用例。

You should consider maintaining a set of development data rather than just pulling a subset of your production. In the case that you're writing unit tests, you could use the same data that is required for the tests, trying to hit all of the possible use cases.

这篇关于Postgres转储仅一部分表以获取开发快照的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆