SQL数据库中的维度和单元分析 [英] dimensional and unit analysis in SQL database

查看:20
本文介绍了SQL数据库中的维度和单元分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:

存储各种测量值的时间序列数据的关系数据库 (Postgres).每个测量值都可以具有特定的测量类型"(例如温度、溶解氧等),并且可以具有特定的测量单位"(例如华氏/摄氏度/开尔文、百分比/毫克/升等).

A relational database (Postgres) storing timeseries data of various measurement values. Each measurement value can have a specific "measurement type" (e.g. temperature, dissolved oxygen, etc) and can have specific "measurement units" (e.g. Fahrenheit/Celsius/Kelvin, percent/milligrams per liter, etc).

问题:

有没有人建立一个类似的数据库来保存维度完整性?有什么建议吗?

我正在考虑构建一个measurement_type 和一个measurement_unit 表,这两个表都有文本两列,ID 和文本.然后我会在measured_value 表中为这些表创建外键.文本让我有些担心,因为可能存在非唯一重复项(例如,'ug/l' vs 'µg/l' 表示每升微克).

I'm considering building a measurement_type and a measurement_unit table, both of these would have text two columns, ID and text. Then I would create foreign keys to these tables in the measured_value table. Text worries me somewhat because there's the possibility for non-unique duplicates (e.g. 'ug/l' vs 'µg/l' for micrograms per liter).

这样做的目的是让我可以在查询中或通过外部编程转换和验证单位.理想情况下,我以后有能力进行严格的尺寸分析(例如,将 µg/l 与值M/V"(质量除以体积)联系起来).

The purpose of this would be so that I can both convert and verify units on queries, or via programming externally. Ideally, I would have the ability later to include strict dimensional analysis (e.g. linking µg/l to the value 'M/V' (mass divided by volume)).

有没有更优雅的方法来实现这一点?

推荐答案

我在很久以前就制作了一个用于处理单元的数据库子模式(好吧,我稍微夸大了一点;不过那是大约 20 年前的事了).幸运的是,它只需要处理简单的质量、长度、时间维度——而不是温度、电流或光度等.游戏的货币方面并不那么简单——一种货币之间有无数种不同的转换方式另一个取决于转换率有效的日期、货币和期间.这是与物理单位分开处理的.

I produced a database sub-schema for handling units an aeon ago (okay, I exaggerate slightly; it was about 20 years ago, though). Fortunately, it only had to deal with simple mass, length, time dimensions - not temperature, or electric current, or luminosity, etc. Rather less simple was the currency side of the game - there were a myriad different ways of converting between one currency and another depending on date, currency, and period over which conversion rate was valid. That was handled separately from the physical units.

从根本上说,我创建了一个度量"表,其中包含一个id"列、一个单位名称、一个缩写和一组维度指数——质量、长度和时间各一个.这将填充诸如体积"(长度 = 3、质量 = 0、时间 = 0)、密度"(长度 = 3、质量 = -1、时间 = 0)之类的名称.

Fundamentally, I created a table 'measures' with an 'id' column, a name for the unit, an abbreviation, and a set of dimension exponents - one each for mass, length, time. This gets populated with names such as 'volume' (length = 3, mass = 0, time = 0), 'density' (length = 3, mass = -1, time = 0) - and the like.

还有第二个单位表,它标识了一个度量,然后是特定度量使用的实际单位.例如,有桶、立方米,以及各种其他相关单位.

There was a second table of units, which identified a measure and then the actual units used by a particular measurement. For example, there were barrels, and cubic metres, and all sorts of other units of relevance.

第三个表格定义了特定单位之间的换算系数.这包括两个单元和将单元 1 转换为单元 2 的乘法转换因子.这里最大的问题是转换因子的动态范围.如果从 U1 到 U2 的转换是 1.234E+10,那么倒数是一个很小的数字(8.103727714749e-11).

There was a third table that defined conversion factors between specific units. This consisted of two units and the multiplicative conversion factor that converted unit 1 to unit 2. The biggest problem here was the dynamic range of the conversion factors. If the conversion from U1 to U2 is 1.234E+10, then the inverse is a rather small number (8.103727714749e-11).

S.Lott 关于温度的评论很有趣——我们不必处理这些.存储过程可以解决这个问题 - 尽管将一个存储过程集成到系统中可能会很棘手.

The comment from S.Lott about temperatures is interesting - we didn't have to deal with those. A stored procedure would have addressed that - though integrating one stored procedure into the system might have been tricky.

我描述的方案允许对大多数转换进行一次描述(包括假设单位,例如每两周弗隆,或不太假设但同样模糊的单位 - 在美国以外 - 如英亩-英尺),并且可以验证转换(对于例如,转换系数表中的两个单位必须具有相同的度量).它可以扩展到处理大多数其他单位——尽管角度(或立体角)等无量纲单位存在一些有趣的问题.有支持代码可以处理任意转换 - 或在无法支持转换时生成错误.使用该系统的一个原因是,各个国际关联公司将在当地方便的单位报告他们的数据,但总部系统必须接受原始数据,并以适合经理人的单位呈现结果汇总数据——其中不同的经理人各自对于报告的最佳单位,他们有自己的想法(基于他们的国家背景和在总部的工作年限).

The scheme I described allowed most conversions to be described once (including hypothetical units such as furlongs per fortnight, or less hypothetical but equally obscure ones - outside the USA - like acre-feet), and the conversions could be validated (for example, both units in the conversion factor table had to have the same measure). It could be extended to handle most of the other units - though the dimensionless units such as angles (or solid angles) present some interesting problems. There was supporting code that would handle arbitrary conversions - or generate an error when the conversion could not be supported. One reason for this system was that the various international affiliate companies would report their data in their locally convenient units, but the HQ system had to accept the original data and yet present the resulting aggregated data in units that suited the managers - where different managers each had their own idea (based on their national background and length of duty in the HQ) about the best units for their reports.

这篇关于SQL数据库中的维度和单元分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆