在PostgreSQL中表示稀疏数据 [英] Representing Sparse Data in PostgreSQL

查看:88
本文介绍了在PostgreSQL中表示稀疏数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在PostgreSQL中表示稀疏数据矩阵的最佳方法是什么?我看到的两个明显的方法是:


  1. 将数据存储在一个表中,每个可能的功能都有一个单独的列(可能百万),但未使用功能的默认值为NULL。从概念上讲这很简单,但是我知道对于大多数RDMS实现,这通常效率很低,因为NULL值通常占用 some 空间。但是,我读了一篇文章(不幸的是找不到它的链接),该文章声称PG不占用NULL值的数据,因此更适合存储稀疏数据。


  2. 创建单独的行和列表,以及用于链接它们并在该行存储该列的值的中间表。我相信这是更传统的RDMS解决方案,但是与此相关的是更多的复杂性和开销。


PostgreDynamic ,它声称可以更好地支持稀疏数据,但是我不想

还有其他解决方案吗?我应该使用哪一个?

解决方案

想到了一些解决方案,



1)将您的要素分成通常一起设置的组,为每个组创建一个表,该表与主数据具有一对一的外键关系,仅在查询时联接需要的表



2)使用EAV反模式,使用主表中的外键字段以及字段名和值列创建功能表,并将功能存储为行在该表中而不是在主表中作为属性



3)与PostgreDynamic的操作类似,为主表中的每个列创建一个表(它们用于为这些表创建一个单独的命名空间),并创建函数以简化(以及高效索引)访问和更新这些表中的数据



4)在您的使用XML或VARCHAR的原始数据,并在其中存储一些结构化的文本格式,表示您的数据,在具有功能索引的数据上创建索引,编写函数以更新数据(如果您使用的是XML格式,则使用XML函数)



5)使用contrib / hstore模块创建一个hstore类型的列,该列可以容纳键值对,并且可以对其进行索引和更新



6)包含大量空字段


What's the best way to represent a sparse data matrix in PostgreSQL? The two obvious methods I see are:

  1. Store data in a single a table with a separate column for every conceivable feature (potentially millions), but with a default value of NULL for unused features. This is conceptually very simple, but I know that with most RDMS implementations, that this is typically very inefficient, since the NULL values ususually takes up some space. However, I read an article (can't find its link unfortunately) that claimed PG doesn't take up data for NULL values, making it better suited for storing sparse data.

  2. Create separate "row" and "column" tables, as well as an intermediate table to link them and store the value for the column at that row. I believe this is the more traditional RDMS solution, but there's more complexity and overhead associated with it.

I also found PostgreDynamic, which claims to better support sparse data, but I don't want to switch my entire database server to a PG fork just for this feature.

Are there any other solutions? Which one should I use?

解决方案

A few solutions spring to mind,

1) Separate your features into groups that are usually set together, create a table for each group with a one-to-one foreign key relationship to the main data, only join on tables you need when querying

2) Use the EAV anti-pattern, create a 'feature' table with a foreign key field from your primary table as well as a fieldname and a value column, and store the features as rows in that table instead of as attributes in your primary table

3) Similarly to how PostgreDynamic does it, create a table for each 'column' in your primary table (they use a separate namespace for those tables), and create functions to simplify (as well as efficiently index) accessing and updating the data in those tables

4) create a column in your primary data using XML, or VARCHAR, and store some structured text format within it representing your data, create indexes over the data with functional indexes, write functions to update the data (or use the XML functions if you are using that format)

5) use the contrib/hstore module to create a column of type hstore that can hold key-value pairs, and can be indexed and updated

6) live with lots of empty fields

这篇关于在PostgreSQL中表示稀疏数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆