大数据量的数据库选择? [英] Database choice for large data volume?

查看:505
本文介绍了大数据量的数据库选择?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要开始一个新项目,它应该有一个相当大的数据库。

I'm about to start a new project which should have a rather large database.

表的数量不会很大(<15),大部分数据(99%)将包含在一个大表中,几乎是插入/读取

The number of tables will not be large (<15), majority of data (99%) will be contained in one big table, which is almost insert/read only (no updates).

该表中的估计数据量将以每日500.000条记录增长,我们应该至少 1年

The estimated amount of data in that one table is going to grow at 500.000 records a day, and we should keep at least 1 year of them to be able to do various reports.

需要(只读) 数据库作为备份/故障转移,也可能在高峰时间卸载报告。

There needs to be (read-only) replicated database as a backup/failover, and maybe for offloading reports in peak time.

我没有这个大型数据库的第一手经验,所以我在这种情况下,请求具有哪个DB是最好的选择。我知道 Oracle 是安全的,但如果任何人有类似设置的 Postgresql Mysql 的经验,我更感兴趣。

I don't have first hand experience with that large databases, so I'm asking the ones that have which DB is the best choice in this situation. I know that Oracle is the safe bet, but am more interested if anyone have experience with Postgresql or Mysql with similar setup.

推荐答案

我在一个环境中使用PostgreSQL,每天看到100K-2M个新行,最多添加到一个表中。但是,这些行往往被缩减为样本,然后在几天内删除,因此我不能谈论长于约100M行的性能。

I've used PostgreSQL in an environment where we're seeing 100K-2M new rows per day, most added to a single table. However, those rows tend to be reduced to samples and then deleted within a few days, so I can't speak about long-term performance with more than ~100M rows.

我发现插入性能是相当合理的,特别是如果你使用批量COPY。查询性能是好的,虽然计划器的选择有时让我困惑;特别是在执行JOIN / EXISTS时。我们的数据库需要很好的定期维护(VACUUM / ANALYZE),以保持其运行顺利。我可以通过更仔细地优化自动真空和其他设置,避免一些这,而不是一个问题,如果你不做许多删除。总的来说,在某些方面我觉得配置和维护比配置和维护更困难。

I've found that insert performance is quite reasonable, especially if you use the bulk COPY. Query performance is fine, although the choices the planner makes sometimes puzzle me; particularly when doing JOINs / EXISTS. Our database requires pretty regular maintenance (VACUUM/ANALYZE) to keep it running smoothly. I could avoid some of this by more carefully optimizing autovacuum and other settings, and it's not so much of an issue if you're not doing many DELETEs. Overall, there are some areas where I feel it's more difficult to configure and maintain than it should be.

我没有使用Oracle,MySQL只用于小数据集,所以我不能比较性能。但PostgreSQL对于大型数据集来说很好。

I have not used Oracle, and MySQL only for small datasets, so I can't compare performance. But PostgreSQL does work fine for large datasets.

这篇关于大数据量的数据库选择?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆