使用链表的多维稀疏数组 [英] Multidimensional sparse arrays using linked lists
问题描述
我正在实现聚合模型来计算数据立方体。
我已经静态地实现了多路数组聚合。也就是说,如果有基数16和块大小为4的4个维度,我创建chunkArray [4] [4] [4] [4]并将度量插入适当的位置并聚合度量。 chunkArray是一个稀疏数组,静态实现效率不高。为了动态创建chunkArray,其中仅包含为发生的实例存储的度量,我需要块中每个单元格的偏移量。
例如:维度3
A [8],B [8],C [8] - 每个具有基数的维度8.
如果关系数据库表已完全填充,那么将有8 * 8 * 8个实例在表中。
如果没有,那么一些实例将会丢失。
a1,b1,c1,20
a1,b1 ,c2,3
.......
.......
a1,b1,c8,12
这就像真值表一样,有3个变量,每个变量最多可以占用8个值,而数字真值表中只有2个值。
聚合将在块上完成。如果我有实例a1,b1,c1 - a1,b1,c3 - a1,b1,c8 - 某些实例将丢失。我需要的是实例与第一个实例的偏移量。
a1,b1,c1 offset = 0
a1,b1,c3 offset = 2 >
a1,b1,c8 offset = 7
假设所有维度的块大小为2,则块的总数将为8.我将引入前8个块单元和聚合。但问题是,在前8个细胞中只有3个是密集的,其余5个不存在。我被困在找到块单元的偏移量。如果您有任何想法,请告诉我。
Hi,
I am implementing an aggregation model to compute data cubes.
I have implemented the multiway array aggregation statically. That is if there are, say, 4 dimensions each of cardinality 16 and the chunk size of 4, I create chunkArray[4][4][4][4] and insert the measure at the appropriate position and aggregate the measure. The chunkArray is a sparse array and static implementation is not efficient. In order the dynamically create the chunkArray with measure stored only for the instances that occur I need the offset of each of the cell in the chunk.
Eg: Dimension 3
A[8], B[8], C[8] - each dimension with cardinality 8.
If the relational database table is fully populated then there will be 8*8*8 instances in the table.
If not, then some of the instances will be missing.
a1,b1,c1,20
a1,b1,c2,3
.......
.......
a1,b1,c8,12
This will be like the truth table with 3 variables each of which can take upto 8 values as compared to 2 values in the digital truth table.
The aggregation will be done on chunks. If I have instances a1,b1,c1 - a1,b1,c3 - a1,b1,c8 - some of the instances will be missing. What I need then is an offset of the instance from the first instance.
a1,b1,c1 offset=0
a1,b1,c3 offset=2
a1,b1,c8 offset=7
Suppose the chunk size is 2 in all dimensions, the the total number of chunks will be 8. I will bring in the first 8 chunk cells and aggregate. But the problem is, in the first 8 cells only 3 are dense and rest 5 are not there. I am stuck in finding the offset of the chunk cell. If you have any idea, please let me know.
推荐答案
您能否让我们更好地了解您将来如何使用这些块?我不确定这个和一个表格之间有什么区别,然后是b然后是c。
Could you give us a better idea of how you will be using the chunks in the future? I''m not sure what the difference is between this and a table sorted on a then b then c.
我将使用这些块来聚合它们中存在的值。例如,
chunkArray [0] [0] [0] = 5;
chunkArray [0] [0] [7] = 9; <当我将前2个指数保持不变时,我会得到13来支付
。
假设A代表CITY,B代表ITEM,C代表TIME 。
这些是大块的3个尺寸。
城市项目时间
0- X 0-i1 0-t1
1- Y 1-i2 1-t2
2- Z 2-i3 2-t3
措施可能是销售。 keeing A = 0 B = 0并聚合alon C,我将一直在城市X中获得项目Y的销售
I will be using the chunks to aggregate the values present in them. For eg,
chunkArray[0][0][0] = 5;
chunkArray[0][0][7] = 9;
when i aggregate keeping the first 2 indices constant, i ll get 13.
Say A represents CITY, B represent ITEM and C represents TIME.
These are the 3 dimensions for the chunk.
CITY ITEM TIME
0- X 0-i1 0-t1
1- Y 1-i2 1-t2
2- Z 2-i3 2-t3
the measure may be sales. keeing A=0 B=0 and aggregating alon C, I will get the sales in city X for item Y for all the times
您使用的是C还是C ++?如果你正在使用C ++并且你期望数组非常稀疏我想象地图会很好用。
另一方面如果数组只是略微稀疏(只有少数几个)缺少值)然后使用静态数组可能同样简单且不那么复杂。
Are you using C or C++? If you are using C++ and you expect the array to be quite sparse I imagine a map would work quite well.
On the other hand if the array is only slightly sparse (only a few missing values) then it might be just as easy and less complicated to use a static array.
这篇关于使用链表的多维稀疏数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!