使用链表的多维稀疏数组 [英] Multidimensional sparse arrays using linked lists

查看:60
本文介绍了使用链表的多维稀疏数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我正在实现聚合模型来计算数据立方体。

我已经静态地实现了多路数组聚合。也就是说,如果有基数16和块大小为4的4个维度,我创建chunkArray [4] [4] [4] [4]并将度量插入适当的位置并聚合度量。 chunkArray是一个稀疏数组,静态实现效率不高。为了动态创建chunkArray,其中仅包含为发生的实例存储的度量,我需要块中每个单元格的偏移量。


例如:维度3

A [8],B [8],C [8] - 每个具有基数的维度8.

如果关系数据库表已完全填充,那么将有8 * 8 * 8个实例在表中。

如果没有,那么一些实例将会丢失。

a1,b1,c1,20

a1,b1 ,c2,3

.......

.......

a1,b1,c8,12


这就像真值表一样,有3个变量,每个变量最多可以占用8个值,而数字真值表中只有2个值。


聚合将在块上完成。如果我有实例a1,b1,c1 - a1,b1,c3 - a1,b1,c8 - 某些实例将丢失。我需要的是实例与第一个实例的偏移量。

a1,b1,c1 offset = 0

a1,b1,c3 offset = 2
a1,b1,c8 offset = 7

假设所有维度的块大小为2,则块的总数将为8.我将引入前8个块单元和聚合。但问题是,在前8个细胞中只有3个是密集的,其余5个不存在。我被困在找到块单元的偏移量。如果您有任何想法,请告诉我。

Hi,

I am implementing an aggregation model to compute data cubes.
I have implemented the multiway array aggregation statically. That is if there are, say, 4 dimensions each of cardinality 16 and the chunk size of 4, I create chunkArray[4][4][4][4] and insert the measure at the appropriate position and aggregate the measure. The chunkArray is a sparse array and static implementation is not efficient. In order the dynamically create the chunkArray with measure stored only for the instances that occur I need the offset of each of the cell in the chunk.

Eg: Dimension 3
A[8], B[8], C[8] - each dimension with cardinality 8.
If the relational database table is fully populated then there will be 8*8*8 instances in the table.
If not, then some of the instances will be missing.
a1,b1,c1,20
a1,b1,c2,3
.......
.......
a1,b1,c8,12

This will be like the truth table with 3 variables each of which can take upto 8 values as compared to 2 values in the digital truth table.

The aggregation will be done on chunks. If I have instances a1,b1,c1 - a1,b1,c3 - a1,b1,c8 - some of the instances will be missing. What I need then is an offset of the instance from the first instance.
a1,b1,c1 offset=0
a1,b1,c3 offset=2
a1,b1,c8 offset=7

Suppose the chunk size is 2 in all dimensions, the the total number of chunks will be 8. I will bring in the first 8 chunk cells and aggregate. But the problem is, in the first 8 cells only 3 are dense and rest 5 are not there. I am stuck in finding the offset of the chunk cell. If you have any idea, please let me know.

推荐答案

您能否让我们更好地了解您将来如何使用这些块?我不确定这个和一个表格之间有什么区别,然后是b然后是c。
Could you give us a better idea of how you will be using the chunks in the future? I''m not sure what the difference is between this and a table sorted on a then b then c.


我将使用这些块来聚合它们中存在的值。例如,


chunkArray [0] [0] [0] = 5;

chunkArray [0] [0] [7] = 9; <当我将前2个指数保持不变时,我会得到13来支付



假设A代表CITY,B代表ITEM,C代表TIME 。

这些是大块的3个尺寸。


城市项目时间

0- X 0-i1 0-t1

1- Y 1-i2 1-t2

2- Z 2-i3 2-t3


措施可能是销售。 keeing A = 0 B = 0并聚合alon C,我将一直在城市X中获得项目Y的销售
I will be using the chunks to aggregate the values present in them. For eg,

chunkArray[0][0][0] = 5;
chunkArray[0][0][7] = 9;

when i aggregate keeping the first 2 indices constant, i ll get 13.

Say A represents CITY, B represent ITEM and C represents TIME.
These are the 3 dimensions for the chunk.

CITY ITEM TIME
0- X 0-i1 0-t1
1- Y 1-i2 1-t2
2- Z 2-i3 2-t3

the measure may be sales. keeing A=0 B=0 and aggregating alon C, I will get the sales in city X for item Y for all the times


您使用的是C还是C ++?如果你正在使用C ++并且你期望数组非常稀疏我想象地图会很好用。


另一方面如果数组只是略微稀疏(只有少数几个)缺少值)然后使用静态数组可能同样简单且不那么复杂。
Are you using C or C++? If you are using C++ and you expect the array to be quite sparse I imagine a map would work quite well.

On the other hand if the array is only slightly sparse (only a few missing values) then it might be just as easy and less complicated to use a static array.


这篇关于使用链表的多维稀疏数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆