在数据库中存储大量数据 [英] Storing large amounts of data in a database

查看:193
本文介绍了在数据库中存储大量数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在从事一项家庭自动化项目,该项目使用户可以查看一段时间内的能源使用情况。目前,我们每15分钟请求一次数据,而我们的第一个大飞行员希望获得大约2000名用户。

I'm currently working on a home-automation project which provides the user with the possibility to view their energy usage over a period of time. Currently we request data every 15 minutes and we are expecting around 2000 users for our first big pilot.

我的老板要求我们存储至少半年的数据。快速求和得出约3500万条记录。尽管这些记录很小(每个记录大约500字节),但我仍然想知道是否将它们存储在我们的数据库(Postgres)中是正确的决定。

My boss is requesting we that we store at least half a year of data. A quick sum leads to estimates of around 35 million records. Though these records are small (around 500bytes each) I'm still wondering whether storing these in our database (Postgres) is a correct decision.

有人能提供一些好的参考吗?重要信息和/或建议如何处理这些信息?

Does anyone have some good reference material and/or advise about how to deal with this amount of information?

推荐答案

目前,每条0.5K的3500万条记录意味着37.5G的数据。这适合您的飞行员的数据库,但是您还应该考虑飞行员之后的下一步。当试点取得巨大成功时,您的老板将不高兴,并且您会告诉他,在接下来的几个月中,如果不重新设计所有内容,则无法在系统中添加100.000用户。此外,VIP用户每分钟请求数据的新功能又如何呢?

For now, 35M records of 0.5K each means 37.5G of data. This fits in a database for your pilot, but you should also think of the next step after the pilot. Your boss will not be happy when the pilot will be a big success and that you will tell him that you cannot add 100.000 users to the system in the next months without redesigning everything. Moreover, what about a new feature for VIP users to request data at each minutes...

这是一个复杂的问题,您做出的选择将限制您的发展软件。

This is a complex issue and the choice you make will restrict the evolution of your software.

对于飞行员来说,请保持尽可能简单的操作,以使产品尽可能便宜地出售->对于数据库来说还可以。但是请告诉老板,您不能像这样打开服务,并且必须在每周增加10.000个新用户之前进行更改。

For the pilot, keep it as simple as possible to get the product out as cheap as possible --> ok for a database. But tell you boss that you cannot open the service like that and that you will have to change things before getting 10.000 new users per week.

下一个版本的一件事:有许多数据存储库:一个用于经常更新的用户数据,一个用于查询/统计系统,...

One thing for the next release: have many data repositories: one for your user data that is updated frequently, one for you queries/statistics system, ...

您可以查看 RRD 用于下一个版本。

You could look at RRD for your next release.

还要记住更新频率:2000个用户每15分钟更新一次数据意味着每秒2.2次更新->可以;每5分钟100.000个用户更新数据意味着每秒333.3次更新。我不确定简单的数据库是否可以满足要求,而单个Web服务服务器肯定不能满足要求。

Also keep in mind the update frequency: 2000 users updating data each 15 minutes means 2.2 updates per seconds --> ok; 100.000 users updating data each 5 minutes means 333.3 updates per seconds. I am not sure a simple database can keep up with that, and a single web service server definitely cannot.

这篇关于在数据库中存储大量数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆