密集和稀疏矩阵的有效(时间和空间复杂度)数据结构 [英] Efficient (time and space complexity) data structure for dense and sparse matrix

查看:656
本文介绍了密集和稀疏矩阵的有效(时间和空间复杂度)数据结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须读取储存有汽车矩阵的档案( 1 = BlueCar,2 = RedCar,0 =空白)。

I have to read a file in which is stored a matrix with cars (1=BlueCar, 2=RedCar, 0=Empty).

我需要以这种方式编写一个算法来移动矩阵的


  • blue 向下移动;

  • 红色向右移动;

  • 有一个转向,其中所有蓝色的移动和转动所有红色的移动。

  • blue ones move downward;
  • red ones move rightward;
  • there is a turn in which all the blue ones move and a turn to move all the red ones.

在读取文件之前,我不知道矩阵大小,如果它是密集或稀疏的,所以我必须实现两个数据结构密集和一个稀疏)和两个算法。

Before the file is read I don't know the matrix size and if it's dense or sparse, so I have to implement two data structures (one for dense and one for sparse) and two algorithms.

我需要达到最好的时间和空间复杂性

由于未知的矩阵大小,我想把数据存储在堆上。

Due to the unknown matrix size, I think to store the data on the heap.

如果矩阵是密集,我认为使用类似:

If the matrix is dense, I think to use something like:

short int** M = new short int*[m];
short int*  M_data = new short int[m*n];

for(int i=0; i< m; ++i) 
{
    M[i] = M_data + i * n;
}

使用这种结构我可以分配一个连续的内存空间,要访问 M [i] [j]

With this structure I can allocate a contiguous space of memory and it is also simple to be accessed with M[i][j].

现在的问题是稀疏情况,我还必须考虑如何以最简单的方式移动汽车通过算法:例如,当我评估一辆汽车,我需要很容易找到如果在下一个位置

Now the problem is the structure to choose for the sparse case, and I have to consider also how I can move the cars through the algorithm in the simplest way: for example when I evaluate a car, I need to find easily if in the next position (downward or rightward) there is another car or if it's empty.

最初,我想要定义从一般Car对象继承的BlueCar和RedCar对象。在这个对象中,我可以保存矩阵坐标,然后将它们放在:

Initially I thought to define BlueCar and RedCar objects that inherits from the general Car object. In this objects I can save the matrix coordinates and then put them in:

std::vector<BluCar> sparseBlu;
std::vector<RedCar> sparseRed;

否则,我可以这样做:

vector< tuple< row, column, value >> sparseMatrix

但是找到下一个位置的问题仍然存在。

But the problem of finding what's in the next position still remains.

这可能不是最好的方法,那么如何以有效的方式实现稀疏的情况呢? (也使用稀疏的唯一结构)

Probably this is not the best way to do it, so how can I implement the sparse case in a efficient way? (also using a unique structure for sparse)

推荐答案

为什么不创建内存映射直接在文件上? (假设您的数据0,1,2存储在文件中的连续字节(或位)中,这些字节的位置也表示汽车的坐标)

Why not simply create a memory mapping directly over the file? (assuming your data 0,1,2 is stored in contiguous bytes (or bits) in the file, and the position of those bytes also represents the coordinates of the cars)

这样,您不需要分配额外的内存并读取所有数据,并且可以使用 M [i] [j] 简单有效地访问数据。

This way you don't need to allocate extra memory and read in all the data, and the data can simply and efficiently be accessed with M[i][j].

跳过这些行会有L1缓存友好。

Going over the rows would be L1-cache friendly.

如果数据非常稀疏,扫描数据一次,并保存一个空的区域/块在内存中的列表(只需要存储startpos和大小),然后你可以跳过(并在需要时调整)进一步运行。

In case of very sparse data, you could scan through the data once and keep a list of the empty regions/blocks in memory (only need to store startpos and size), which you could then skip (and adjust where needed) in further runs.

通过内存映射,只有频繁访问的页面保存在内存中。这意味着,一旦扫描了空白区域,内存将只分配给经常访问的非空白区域(所有这些都将由内核自动完成 - 无需自己跟踪)。

With memory mapping, only frequently accessed pages are kept in memory. This means that once you have scanned for the empty regions, memory will only be allocated for the frequently accessed non-empty regions (all this will be done automagically by the kernel - no need to keep track of it yourself).

另一个好处是直接访问操作系统磁盘缓存。因此,不需要在内核空间和用户空间之间保持复制和移动数据。

Another benefit is that you are accessing the OS disk cache directly. Thus no need to keep copying and moving data between kernel space and user space.

为了进一步优化空间和内存使用,汽车可以存储在2位档案。

To further optimize space- and memory usage, the cars could be stored in 2 bits in the file.

更新


将不得不使用openMP和MPI来移动汽车...内存映射
是否也与并发线程一起工作?

I'll have to move cars with openMP and MPI... Will the memory mapping work also with concurrent threads?

当然可以使用多线程,但不确定openMP是否会是这里的最佳解决方案,因为如果你在同一时间处理数据的不同部分,你可能需要检查一些重叠区域(即一辆汽车可以从一个块移动到另一个)。

You could certainly use multithreading, but not sure if openMP would be the best solution here, because if you work on different parts of the data at the same time, you may need to check some overlapping regions (i.e. a car could move from one block to another).

或者你可以让线程在块的中间部分工作,然后启动其他线程来做边界(红色的车会是一个字节,蓝色汽车全行)。

Or you could let the threads work on the middle parts of the blocks, and then start other threads to do the boundaries (with red cars that would be one byte, with blue cars a full row).

您还需要一个锁定机制来调整稀疏区域的列表。我认为最好的方法是启动单独的线程(取决于数据的大小当然)。

You would also need a locking mechanism for adjusting the list of the sparse regions. I think the best way would be to launch separate threads (depending on the size of the data of course).

这篇关于密集和稀疏矩阵的有效(时间和空间复杂度)数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆