Java 中的稀疏矩阵/数组 [英] Sparse matrices / arrays in Java

查看：34 发布时间：2021/11/25 14:36:50 java algorithm sparse-matrix sparse-array

本文介绍了Java 中的稀疏矩阵/数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理一个用 Java 编写的项目，该项目要求我构建一个非常大的二维稀疏数组.非常稀疏，如果这有所作为.无论如何:这个应用程序最重要的方面是时间方面的效率(假设内存负载，虽然没有几乎无限地允许我使用标准的二维数组——关键范围是两个维度的数十亿).

I'm working on a project, written in Java, which requires that I build a very large 2-D sparse array. Very sparse, if that makes a difference. Anyway: the most crucial aspect for this application is efficency in terms of time (assume loads of memory, though not nearly so unlimited as to allow me to use a standard 2-D array -- the key range is in the billions in both dimensions).

在阵列中的 kajillion 单元中，将有几十万个单元包含一个对象.我需要能够非常快速地修改单元格内容.

Out of the kajillion cells in the array, there will be several hundred thousand cells which contain an object. I need to be able to modify cell contents VERY quickly.

无论如何:有没有人知道为此目的特别好的图书馆?它必须是 Berkeley、LGPL 或类似许可证(没有 GPL，因为该产品不能完全开源).或者，如果只有一种非常简单的方法来制作自制的稀疏数组对象，那也很好.

Anyway: Does anyone know a particularly good library for this purpose? It would have to be Berkeley, LGPL or similar license (no GPL, as the product can't be entirely open-sourced). Or if there's just a very simple way to make a homebrew sparse array object, that'd be fine too.

我正在考虑 MTJ，但没有听到任何关于其质量的意见.

I'm considering MTJ, but haven't heard any opinions on its quality.

推荐答案

使用哈希映射构建的稀疏数组对于频繁读取的数据效率非常低.最有效的实现使用 Trie，它允许访问分段分布的单个向量.

Sparsed arrays built with hashmaps are very inefficient for frequently read data. The most efficient implementations uses a Trie that allows access to a single vector where segments are distributed.

Trie 可以通过仅执行只读的两个数组索引来计算元素是否存在于表中，以获取元素存储的有效位置，或者知道它是否存在于底层存储中.

A Trie can compute if an element is present in the table by performing only read-only TWO array indexing to get the effective position where an element is stored, or to know if its absent from the underlying store.

它还可以为稀疏数组的默认值在后备存储中提供一个默认位置，这样您就不需要对返回的索引进行任何测试，因为 Trie 保证所有可能的源索引至少会映射到后备存储中的默认位置(您将经常在其中存储零、空字符串或空对象).

It can also provide a default position in the backing store for the default value of the sparsed array, so that you don't need ANY test on the returned index, because the Trie guarantees that all possible source index will map at least to the default position in the backing store (where you'll frequently store a zero, or an empty string or a null object).

存在支持快速更新尝试的实现，具有可选的compact()"操作以在多个操作结束时优化后备存储的大小.尝试比哈希映射快得多，因为它们不需要任何复杂的哈希函数，也不需要处理读取冲突(使用哈希映射，读取和写入都会发生冲突，这需要循环跳到下一个候选位置，并对每个位置进行测试以比较有效的源索引...)

There exists implementations that support fast-updatable Tries, with an otional "compact()" operation to optimze the size of the backing store at end of multiple operations. Tries are MUCH faster than hashmaps, because they don't need any complex hashing function, and don't need to handle collisions for reads (With Hashmaps, you have collision BOTH for reading and for writing, this requires a loop to skip to the next candidate position, and a test on each of them to compare the effective source index...)

此外，Java Hashmaps 只能索引对象，并且为每个散列的源索引创建一个 Integer 对象(每次读取都需要创建该对象，而不仅仅是写入)在内存操作方面成本很高，因为它强调垃圾收集器.

In addition, Java Hashmaps can only index on Objects, and creating an Integer object for each hashed source index (this object creation will be needed for every read, not just writes) is costly in terms of memory operations, as it stresses the garbage collector.

我真的希望 JRE 包含一个 IntegerTrieMap

Java 中的稀疏矩阵/数组 [英] Sparse matrices / arrays in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java 中的稀疏矩阵/数组 [英] Sparse matrices / arrays in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭