Postgresql表中的最大(可用)行数 [英] Maximum (usable) number of rows in a Postgresql table

查看:677
本文介绍了Postgresql表中的最大(可用)行数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我意识到,根据Pg文档( http://www.postgresql.org/about/),一个表中可以存储无数行。但是,可用行数的经验法则是什么(如果有的话)?

I realize that, per Pg docs (http://www.postgresql.org/about/), one can store an unlimited number of rows in a table. However, what is the "rule of thumb" for usable number of rows, if any?

背景:我想将每天的读数存储1300年,保持数十年细胞。得出结果为13 M *(366 | 365)* 20〜9.5e10,或95 B行(实际上,大约120 B行)。

Background: I want to store daily readings for a couple of decades for 13 million cells. That works out to 13 M * (366|365) * 20 ~ 9.5e10, or 95 B rows (in reality, around 120 B rows).

因此,使用在进行表分区时,我建立了一个主表,然后按年份继承了表。这样一来,每张表将行划分为〜5.2 B行。

So, using table partitioning, I set up a master table, and then inherited tables by year. That divvies up the rows to ~ 5.2 B rows per table.

每行是9个SMALLINT和2个INT,因此为26个字节。除此之外,每行23字节的Pg开销,每行49字节。因此,每个没有任何PK或任何其他索引的表的重量约为0.25 TB。

Each row is 9 SMALLINTs, and two INTs, so, 26 bytes. Add to that, the Pg overhead of 23 bytes per row, and we get 49 bytes per row. So, each table, without any PK or any other index, will weigh in at ~ 0.25 TB.

对于初学者来说,我仅创建了上述数据的一部分,也就是说,仅适用于大约250,000个单元。我必须进行大量调整(创建适当的索引等),但是现在的性能真的很糟糕。此外,每次我需要添加更多数据时,都将必须删除密钥并重新创建它们。节省的好处是,一旦加载了所有内容,它将是一个只读数据库。

For starters, I have created only a subset of the above data, that is, only for about 250,000 cells. I have to do a bunch of tuning (create proper indexes, etc.), but the performance is really terrible right now. Besides, every time I need to add more data, I will have to drop the keys and the recreate them. The saving grace is that once everything is loaded, it will be a readonly database.

有什么建议吗?还有其他分区策略吗?

Any suggestions? Any other strategy for partitioning?

推荐答案

这不只是一堆调整(索引等)。这是至关重要的,必须做。

It's not just "a bunch of tuning (indexes etc.)". This is crucial and a must do.

您发布了一些详细信息,但让我们尝试。

You posted few details, but let's try.

规则是:尝试找到最常用的方法组。查看它是否适合RAM。为此优化硬件,PG / OS缓冲区设置和PG索引/集群。否则,请寻找汇总,或者如果它不可接受并且您需要完全随机访问,请考虑使用哪种硬件可以在合理的时间内为您扫描整个表。

The rule is: Try and find the most common working set. See if it fits in RAM. Optimize hardware, PG/OS buffer settings and PG indexes/clustering for it. Otherwise look for aggregates, or if it's not acceptable and you need fully random access, think what hardware could scan the whole table for you in reasonable time.

您的计算机有多大表(以GB为单位)?它与总RAM相比如何?您的PG设置是什么,包括shared_buffers和Effective_cache_size?这是一台专用服务器吗?如果您有一个250位数的表和大约10 GB的RAM,则意味着您只能容纳该表的4%。

How large is your table (in gigabytes)? How does it compare to total RAM? What are your PG settings, including shared_buffers and effective_cache_size? Is this a dedicated server? If you have a 250-gig table and about 10 GB of RAM, it means you can only fit 4% of the table.

是否有任何常用的列用于过滤,例如状态或日期?您能否使用最常用的工作集(仅在上个月使用)?如果是这样,请考虑在这些列上进行分区或群集化,并确定对其建立索引。基本上,您正在尝试确保尽可能多的工作集适合RAM。

Are there any columns which are commonly used for filtering, such as state or date? Can you the working set that is most commonly used (like only last month)? If so, consider partitioning or clustering on these columns, and definitely index them. Basically, you're trying to make sure that as much of the working set as possible fits in RAM.

如果不适合该表,请不惜一切代价扫描表。内存。如果您确实需要绝对随机的访问,那么唯一可用的方法就是非常复杂的硬件。您需要一个持久的存储/ RAM配置,该配置可以在合理的时间内读取250 GB。

Avoid scanning the table at all costs if it does not fit in RAM. If the you really need absolutely random access, the only way it could be usable is really sophisticated hardware. You would need a persistent storage/RAM configuration which can read 250 GB in reasonable time.

这篇关于Postgresql表中的最大(可用)行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆