如何计算OLAP多维数据集的可能大小 [英] How to calculate the likely size of an OLAP cube

查看:135
本文介绍了如何计算OLAP多维数据集的可能大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人知道一个方法来获得基于星型模式数据仓库的OLAP多维数据集的粗略大小。基于维度数量,维度表中的记录数量以及事实记录数量,以及最终聚合数量或不同记录数量等。

Does anyone know a method to use to get a rough size of an OLAP cube based on a star schema data warehouse. Something based on the number of dimensions, the number of records in the dimension tables and the number of fact records and finally the number of aggregations or distinct records etc..

数据库我正在看一个事实表,有超过200亿行和几个维度表2000万,7000万和13亿行。

The database I am looking at has a fact table of over 20 billion rows and a few dimension tables of 20 million, 70 million and 1.3 billion rows.

感谢
Nicholas

Thanks Nicholas

推荐答案

我并不是OLAP的专家,只是在我的头顶,我可以看到一些漂亮创造这个估计的基本障碍。特别是,知道维度表的行计数和基数在孤立的情况下并不像它们之间的关系那么重要。

I'm by no means an expert in OLAP, but just off the top of my head I can see some pretty fundamental roadblocks to creating this estimate. In particular, knowing the row counts and cardinalities of the dimension tables in isolation isn't nearly as important as the relationships between them.

示例:想象两个低基数维度分别具有 n m 个唯一值。在维度上缓存OLAP聚合可以产生从 n + m 值到 n * m 尺寸之间的关系类似于纯双射。只要你提供的信息,我很确定你可以放心地说,你会得到少于 3.64 * 10 ^ 34 值。这显然不是非常有用。

Example: Imagine two low-cardinality dimensions with n and m unique values respectively. Caching OLAP aggregates over the dimensions could produce anywhere from n + m values to n * m values depending on how closely the relationship between the dimensions resembles a pure bijection. Given just the information you provided, I'm pretty sure all you can safely say is that you'll end up with fewer than 3.64 * 10^34 values. This is obviously not very useful.

我很悲观,你将能够创建任何通用的算法,足以有效地提供估计,它不会更有意义只是生成的立方体,称重它,当你做完了。我可以想到的理论方法,你可以应用,如果你有所有的维度的位图索引,但1)你可能不会和2)实现将是一个冒险,一个比我更高级可以舒适地帮助你。

I'm pessimistic that you would be able to create any general algorithm that provides estimates efficiently enough that it wouldn't make more sense to just generate the cube and weigh it when you're done. I can think of theoretical methods you could apply if you had bitmap indices of all of your dimensions, but 1) you probably don't and 2) the implementation would be an adventure, and one that's more advanced than I can comfortably help you with.

这篇关于如何计算OLAP多维数据集的可能大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆