比较表中的行对 [英] Comparing pairs of rows in a table

查看:78
本文介绍了比较表中的行对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以在查询中使用一些补救帮助,该查询比较同一表中的行。我正在修改用于推送到Postgres的代码,并设置了测试平台以使用我们的新旧代码进行推送。行数很好,但这并不能告诉我数据是否相同。为此,我意识到我可以使用内置功能来获取行的哈希值。这里是我的简单表格之一:

I could use some remedial help on a query that compares rows in the same table. I'm revamping our code for pushing to Postgres, and have set up a test bed to push using our old and new code. The row counts are fine, but that doesn't tell me if the data is the same. For that, I realized I can use built-in features to get a hash of the row. Here's one of my simple tables as a start:

CREATE TABLE IF NOT EXISTS data.hsys (
    "id" uuid NOT NULL DEFAULT NULL,
    "marked_for_deletion" boolean NOT NULL DEFAULT false,
    "name_" citext NOT NULL DEFAULT NULL,

CONSTRAINT hsys_id_pkey
    PRIMARY KEY ("id")
);

然后散列仅采用此行:

select hashtext(hsys::text) from hsys;

我要做的是存储ID,哈希,代码版本和表名每次测试都将其推送到一个小表中:

What I want to do is store the ID, hash, code version and table name for each row after each test push into a small table:

CREATE TABLE IF NOT EXISTS data.row_check (
    id         uuid NOT NULL DEFAULT NULL,
    version    int8 NOT NULL DEFAULT NULL,
    row_hash   int8 NOT NULL DEFAULT NULL,
    table_name text NOT NULL DEFAULT NULL,

CONSTRAINT row_check_pkey
    PRIMARY KEY (id, version)
);

将数据插入row_check并不难。概念验证查询如下所示:

Inserting the data into row_check isn't hard. The proof-of-concept query looks like this:

select 
   id,
    0 as version,
    hashtext(hsys::text)  as row_hash,
   'hsys' as table_name,
from hsys;

有效的插入查询如下:

INSERT INTO row_check (id,version,row_hash,table_name)
            SELECT id, 0, hashtext(hsys::text),'hsys' 
            FROM hsys

            ON CONFLICT ON CONSTRAINT row_check_pkey DO UPDATE SET
                row_hash   = EXCLUDED.row_hash,
                table_name = EXCLUDED.table_name;

数据到位后,我将看到如下内容:

Once the data is in place, I can then see it like this:

select * from row_check
order by 1,2
limit 6;

id                                    version   row_hash    table_name
17ea1ed4-87b0-0840-912f-d29de2a06f5d    0      -1853961325  hsys
17ea1ed4-87b0-0840-912f-d29de2a06f5d    1      -1853961325  hsys
2200d1da-73e7-419c-9e4c-efe020834e6f    0      -482794730   hsys
2200d1da-73e7-419c-9e4c-efe020834e6f    1       482794730   hsys   <--- Different from version 0
47f4a50e-2358-434b-b30d-1f707ea9ee1b    0      -1539190835  hsys
47f4a50e-2358-434b-b30d-1f707ea9ee1b    1      -1539190835  hsys

我理想地从这样的样本中得到的是:

What I'd ideally like from a sample like that is:

table_name id                                       v0         v1
hsys       2200d1da-73e7-419c-9e4c-efe020834e6f 0   -482794730  482794730

但是即使这个最小的结果也会有所帮助:

But even this minimal result would be helpful:

2200d1da-73e7-419c-9e4c-efe020834e6f    hsys

这就是我在哪里我陷入了困境。我要构建的是对row_check的查询,该查询会发现哈希在版本之间不同的所有ID。我上面有版本0和1。有人可以为我指出正确的方向进行分组和/或加入,以便仅获得各版本之间不匹配的行吗?这些是我需要发现和追查的危险信号。我确实需要返回ID和表名,版本和哈希是次要的。我对此有一些限制,其中有一些帮助:

And this is where I'm stumped. What I would like to build is a query on row_check that spots any IDs where the hash is different between versions. I've got versions 0 and 1 above. Can someone point me in the right direction for grouping and/or joining to get only the rows that don't match across versions? Those are my red flags that I need to spot and track down. I really need the ID and table name back, the version and hash are of secondary importance. I've got a few constraints going into this, some of which help:


  • ID值在所有表中都是唯一的。

  • The ID values are unique across all tables.

我一次只能比较两个版本。

I'll only compare two versions at a time.

我有几十个

某些表具有数百万行。

最后一点可能很重要。我虽然在CTE中使用 SELECT DISTINCT id ,但是我对此并没有走远。

That last point might matter. I though of using SELECT DISTINCT id in a CTE, but I didn't get far with that.

谢谢

推荐答案

一旦有了row_check表,就可以通过这种方式加入表

Once you have your row_check table you could join the table this way

 select a.id
      , a.version
      , a.row_hash
      , b.id
      , b.version
      , b.row_hash      
from row_check a 
INNER JOIN row_check b on a.id = b.id  
  AND a.version = 0 
    AND b.version  = 1 
      AND a.row_hash <> b.row_hash

这篇关于比较表中的行对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆