在数据库中存储具有可变数量字段的测量的最有效的方法是什么? [英] What is the most efficient way to store measurements with a variable number of fields in a database?
问题描述
我们有一个数据收集系统,收集来自环境传感器的测量值,测量流经河流或河道的水的速度。每次测量都会生成固定数量的值(例如日期,时间,温度,压力等)以及速度值列表。
最初传感器提供了三个速度值,因此我只是简单地将每个值存储在自己的列在FireBird数据库中的单个表。后来传感器被介绍,可以输出高达九个速度值,所以我简单添加六列。即使大多数传感器使用少于9个值,我估计如果大多数列只包含零,这不会是一个问题。
但是现在我面对一个新一代,可以输出从1到256值,我认为添加另外247列将不是非常有效,特别是因为大多数测量仍然只包含3到9个值。
由于测量每10分钟收集一次,数据库包含30到50个传感器的所有数据,数年的总数据量相当显着,但必须能够生成任何随机时间段的概览/图表。
We have a data collection system that collects measurements from environmental sensors that measure velocity of water flowing through a river or channel. Each measurement generates a fixed number of values (e.g. Date, Time, Temperature, Pressure etc.) plus a list of velocity values.
Originally the sensors supplied three velocity values so I simply stored each value in it's own column of a single table in a FireBird database. Later on sensor were introduced that could output up to nine velocity values so I simple added six more columns. Even though most sensors use less than 9 values, I reckoned it would not be a problem if most of the columns just contained zeroes.
But now I'm facing a new generation that can output anything from 1 to 256 values and I assume it will not be very efficient to add another 247 columns, especially since most of the measurements will still only contain 3 to 9 values.
Since the measurements are collected every 10 minutes, and the database contains all data for 30 to 50 sensors the total amount of data is quite significant after a few years, yet it must be possible to generate overviews/graphs for any random period of time.
那么,什么是最有效的方式来存储变量值列表呢?
由于每个记录都有自己的唯一ID,我假设我可以将所有速度值存储在单独的表中,每个值都标记有它的记录ID。
So what would be the most efficient way to store the variable list of values ?
Since each record has it's own unique ID, I assume I could just store all velocity values in separate table, each value tagged with it's record ID. I just have the feeling that this would not be very efficient and that it would get very slow after while.
推荐答案
数据库可以处理如果使用有效的索引,表中的大量数据。因此,您可以使用此表结构:
Databases can handle large amounts of data in a table if you use efficient indexes. So you can use this table structure:
create table measurements (
id,
seq integer, -- between 1 and 256
ts timestamp, -- Timestamp of the measurement
value decimal(...)
)
在 id
, id,seq
code> ts 。这将允许您有效地搜索数据。如果你不信任你的数据库,只需插入几百万行并运行几个选择以查看它的收益。
Create an index on id
, id, seq
and ts
. That will allow you to search efficiently through the data. If you distrust your database, just insert a few million rows and run a couple of selects to see how well it fares.
为了比较:我有一个Oracle数据库112百万行,我可以通过时间戳或ID在120ms(0.12s)
For comparison: I have an Oracle database here with 112 million rows and I can select a record by timestamp or ID within 120ms (0.12s)
这篇关于在数据库中存储具有可变数量字段的测量的最有效的方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!