使用JSONb从二进制转换为二进制 [英] Binary to binary cast with JSONb

查看:380
本文介绍了使用JSONb从二进制转换为二进制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何避免不必要的CPU成本?


请参见 ...我们可以将上述稳定模式转换复制到通过简单的格式规则,从元数据中提取:

  SELECT string_agg(案例xb [2]中的
! ='text'THEN format(E'(j->> \'%s\')::%s AS%s',x [1],x [2],x [1])
ELSE格式(E'j-> \'%s\'AS%s',x [1],x [1])$ ​​b $ b END,',')为x2
FROM(
SELECT regexp_split_to_array(trim(x),'\s +')x
FROM regexp_split_to_table('d date,t text,b boolean,i int,f float',', ')t1(x)
)t2;

...这是现实生活中的情况,对于小流量应用程序,此模型(显然很丑)出奇地快。除了减少磁盘使用量外,还有其他优势: flexibility (您可以更改数据集模式而无需更改SQL模式)和 scalability (2、3,... 1相同表上的十亿个不同数据集。)


回到问题:想象具有约50列或更多列的数据集,如果PostgreSQL提供从逐项到逐项转换,则SQL VIEW将更快。

解决方案

简短答案:不,没有更好的方法来提取 jsonb 数字作为PostgreSQL比(例如)

  CAST(j->>'attr'AS double precision)

一个JSON数字恰好在内部存储为PostgreSQL 数字,因此无法直接工作无论如何。但是没有任何主要的理由说明为什么没有一种更有效的方法来提取 numeric 这样的值。


所以,为什么


  1. 没有人实现它。这通常表明没有人认为值得付出努力。我个人认为这将是微优化-如果要最大程度地提高效率,可以从JSON中提取该列并将其直接存储为表中的列。


    它不需要修改PostgreSQL源来执行此操作。可以编写自己的C函数来实现您的预​​期。如果很多人认为这是有益的,那么我希望有人已经编写了这样的函数。



  2. PostgreSQL的功能就是时间编译(JIT)。因此,如果对这样的表达式进行大量行求值,PostgreSQL将为该行动态构建可执行代码。这样可以缓解效率低下的问题,并出于效率原因而减少使用特殊情况的必要性。



  3. 这看起来似乎并不那么容易对于许多数据类型。 JSON标准类型不一定在所有情况下都与PostgreSQL类型相对应。这似乎是人为的,但请查看最近的主题在处理JSON和PostgreSQL之间的数字类型之间差异的黑客邮件列表中。



全部以上并非并非此类功能永远不会存在的原因,我只是想说明为什么我们没有此功能。


How to avoid the unnecessary CPU cost?

See this historic question with failure tests. Example: j->'x' is a JSONb representing a number and j->'y' a boolean. Since the first versions of JSONb (issued in 2014 with 9.4) until today (6 years!), with PostgreSQL v12... Seems that we need to enforce double conversion:

  1. Discard j->'x' "binary JSONb number" information and transforms it into printable string j->>'x';
    discard j->'y' "binary JSONb boolean" information and transforms it into printable string j->>'y'.

  2. Parse string to obtain "binary SQL float" by casting string (j->>'x')::float AS x;
    parse string to obtain "binary SQL boolean" by casting string (j->>'y')::boolean AS y.

Is there no syntax or optimized function to a programmer enforce the direct conversion?

I don't see in the guide... Or it was never implemented: is there a technical barrier to it?


NOTES about typical scenario where we need it

(responding to comments)

Imagine a scenario where your system need to store many many small datasets (real example!) with minimal disk usage, and managing all with a centralized control/metadata/etc. JSONb is a good solution, and offer at least 2 good alternatives to store in the database:

  1. Metadata (with schema descriptor) and all dataset in an array of arrays;
  2. Separating Metadata and table rows in two tables.

(and variations where metadata is translated to a cache of text[], etc.)
Alternative-1, monolitic, is the best for the "minimal disk usage" requirement, and faster for full information retrieval. Alternative-2 can be the choice for random access or partial retrieval, when the table Alt2_DatasetLine have also more one column, like time, for time series.

You can create all SQL VIEWS in a separated schema, for example

CREATE mydatasets.t1234 AS 
  SELECT (j->>'d')::date AS d,  j->>'t' AS t,  (j->>'b')::boolean AS b,
         (j->>'i')::int AS i,  (j->>'f')::float AS f
  FROM (
   select jsonb_array_elements(j_alldata) j FROM Alt1_AllDataset
   where dataset_id=1234
  ) t
  -- or FROM alt2...
;

And CREATE VIEW's can by all automatic, running the SQL string dynamically ... we can reproduce the above "stable schema casting" by simple formating rules, extracted from metadata:

SELECT string_agg( CASE 
   WHEN x[2]!='text' THEN format(E'(j->>\'%s\')::%s AS %s',x[1],x[2],x[1])
   ELSE  format(E'j->>\'%s\' AS %s',x[1],x[1])
  END, ',' ) as x2
FROM (
 SELECT  regexp_split_to_array(trim(x),'\s+') x
 FROM regexp_split_to_table('d date, t text, b boolean, i int, f float', ',') t1(x)
) t2;

... It's a "real life scenario", this (apparently ugly) model is surprisingly fast for small traffic applications. And other advantages, besides disk usage reduction: flexibility (you can change datataset schema without need of change in the SQL schema) and scalability (2, 3, ... 1 billion of different datasets on the same table).

Returning to the question: imagine a dataset with ~50 or more columns, the SQL VIEW will be faster if PostgreSQL offers a "bynary to bynary casting".

解决方案

Short answer: No, there is no better way to extract a jsonb number as PostgreSQL than (for example)

CAST(j ->> 'attr' AS double precision)

A JSON number happens to be stored as PostgreSQL numeric internally, so that wouldn't work "directly" anyway. But there is no principal reason why there could not be a more efficient way to extract such a value as numeric.

So, why don't we have that?

  1. Nobody has implemented it. That is often an indication that nobody thought it worth the effort. I personally think that this would be a micro-optimization – if you want to go for maximum efficiency, you extract that column from the JSON and store it directly as column in the table.

    It is not necessary to modify the PostgreSQL source to do this. It is possible to write your own C function that does exactly what you envision. If many people thought this was beneficial, I'd expect that somebody would already have written such a function.

  2. PostgreSQL has just-in-time compilation (JIT). So if an expression like this is evaluated for a lot of rows, PostgreSQL will build executable code for that on the fly. That mitigates the inefficiency and makes it less necessary to have a special case for efficiency reasons.

  3. It might not be quite as easy as it seems for many data types. JSON standard types don't necessarily correspond to PostgreSQL types in all cases. That may seem contrived, but look at this recent thread in the Hackers mailing list that deals with the differences between the numeric types between JSON and PostgreSQL.

All of the above are not reasons that such a feature could never exist, I just wanted to give reasons why we don't have it.

这篇关于使用JSONb从二进制转换为二进制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆