哪个数据库管理器可用于100Go表? [英] Which db manager for a 100Go Table?

查看:86
本文介绍了哪个数据库管理器可用于100Go表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为学习的一部分,我实现了2G / 3G / 4G数据检索项目。
我必须存储此数据并对其进行查询。
我的表:[频率{float},dbm {float},时间戳记{init}]
我每天收到大约15GB的数据,从每分钟100000到200000项,这是6天。

I realize a 2G / 3G / 4G data retrieval project as part of my studies. I have to store this data, and to make queries on it. My table : [freq {float}, dbm {float}, timestamp {init}] I receive about 15GB per day of data, from 100000 to 200000 entries per minute, and that's for 6 day's.

我可以使用简单的DBMS(MySQL / Postgre),但恐怕性能不高。我尝试使用InfluxDB,但每分钟记录的行数少于我的需要。

I could use a simple DBMS (MySQL / Postgre) but I'm afraid that performance is not there. I tried with InfluxDB, but the number of lines recorded per minute is less than my needs.

您还有其他解决方案吗?

Do you have another solution?

非常感谢,
JF

Thank's a lot, J-F

推荐答案

我使用了您提到的所有数据库。对于这种负载,我可以推荐MySQL或PostgreSQL,因为我在PostgreSQL上的负载甚至更高。但是MySQL也会做同样的工作-也许更好,因为它从一开始就为高插入负载而设计。

I use all databases you mentioned. For this load I can recommend MySQL or PostgreSQL because I already worked with even higher load on PostgreSQL. But MySQL will do the same job too - maybe even better because it was designed from the beginning for high insert load.

我使用的PostgreSQL解决方案用于存储系统来自电信网络的消息,每天能够在一台机器上收集约300GB的数据,而不会出现问题。但是您需要适当的硬件架构。

Solution on PostgreSQL I worked with was used for storing system messages from telecommunication network and was able to collect ~300GB of data per a day on one machine without problems. But you need proper HW architecture.

您需要具有至少8个CPU的计算机,但更多计算机更好,并且您需要具有多个插入队列。在Java或C或golang中使用带有更多并行威胁的加载程序,并使用COPY命令对每个威胁进行批量插入,以便批量存储约10000条记录。您必须使用连接池,因为PostgreSQL用于打开新连接的开销更高。

You need machine with at least 8 CPU but more is better and you need to have several inserting queues. Use loader in Java or C or golang with more parallel threats and do bulk inserts from every threat using COPY command for ~10000 records in one bulk. You must use connection pool because PostgreSQL has higher overhead for opening a new connection.

它还将帮助您在更多表空间上分配数据,每个表空间位于单独的物理磁盘上,或者更好地位于单独的物理磁盘阵列上。如果可能,请勿对原始数据使用索引。将原始数据与合计结果分开。

It will also help you to distribute data over more tablespaces each tablespace on separate physical disk or better on separate physical disk array. If possible do not use indexes on raw data. Separate your raw data from aggregated results.

我们还有另一种解决方案,它使用PostgreSQL的pl / proxy扩展名和按时间划分原始数据的几台物理机。该系统每天能够收集至少1TB的数据,但还要有适当数量的从数据库。

We had another solution using pl/proxy extension for PostgreSQL and several physical machines for raw data partitioned by time. This system was able to collect at least 1TB per day but with proper amount of slave databases even more.

但是您必须了解,要真正处理大量数据,您需要具有适当配置的适当硬件。没有神奇的数据库可以在某些笔记本如配置上创造奇迹...

But you have to understand that to really process this amount of data you need proper hardware with proper configuration. There is no magic database which will do miracles on some "notebook like configuration"...

InfluxDB确实是很棒的时间序列数据库,我们将其用于监视。我相信只要有足够的CPU和大量内存,您也可以使用它。我估计您将需要至少64 GB的RAM,因为插入会消耗更多的内存。因此,随着插入队列的增加,数据库将需要更多的内存,因为它将所有内容存储在内存中并自动在标签上建立索引。

InfluxDB is really great timeseries database and we use it for monitoring. I believe with enough CPUs and really lot of memory you will be able to use it too. I estimate you will need minimum 64 GB of RAM because inserts are more memory expensive. So with more inserting queues database will need a lot more memory because it stores everything in memory and makes automatically indexes on tags.

这篇关于哪个数据库管理器可用于100Go表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆