如何在磁盘或数据库上存储稀疏的可查询矩阵? [英] How to store a sparse queryable matrix on disk or database?

查看:110
本文介绍了如何在磁盘或数据库上存储稀疏的可查询矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在磁盘上存储稀疏矩阵。它就像一个具有数百万行和数千列的数据库表,其中许多或大多数列为空。它需要是可查询的,就像SQL SELECT在某些列上有WHERE。



我的具体要求是Java。我首先想到使用 Berkeley DB for Java 来模拟表,但是它不支持基于值的查询。



然后,我考虑使用常规的SQL数据库。例如,创建只有行ID,列ID和值的模式。虚拟行将是具有相同ID的所有实际行。但是,这看起来像数据库滥用。



任何想法?

解决方案

p>在阅读问题标题时,我想到的第一件事就是按照你的最后一段建议的数据库行(x,y)。



另外需要注意的是,数据库通常压缩行,特别是对于NULL,所以直接的表示可能不会浪费你想像的空间。


I need to store a sparse matrix on disk. It is like a database table with millions of rows and thousands of columns, where many or most columns are null. It needs to be queryable, like a SQL SELECT with a WHERE on some of the columns.

My specific requirement is on Java. I first thought of using Berkeley DB for Java to simulate a table, but then it does not support querying based on values.

Then, I thought about using a regular SQL database. For example, creating a schema with only a Row ID, a Column ID, and the value. The virtual row will be all the actual rows with the same ID. But then, this looks like database abuse.

Any ideas?

解决方案

The first thing that came to my mind when reading the question heading was a database row per (x,y) as you suggested in your next to last paragraph.

The other thing to note is that databases often compress the rows, particularly for NULLs, so the straightforward representation may not waste as much space as you think.

这篇关于如何在磁盘或数据库上存储稀疏的可查询矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆