数据完整性SQL查询可修复表中的错误 [英] Data Integrity SQL query to fix the bug in the table
问题描述
当前,我在维度表中的其中一个表上面临数据完整性问题,下面是详细信息
Currently i am facing a data integrity issue on one of the table from dimension table ,below are the details
create table TEST_TBL
(
JOB_ID NUMBER(38,0),
LOC_SID NUMBER(38,0),
CITY VARCHAR2(180 BYTE),
CITY_NM_CLR VARCHAR2(50 BYTE),
CITY_NM_OFFICIAL VARCHAR2(180 BYTE),
DISTRICT VARCHAR2(120 BYTE),
DISTRICT_CD VARCHAR2(20 BYTE),
IN_CNTRY_REG VARCHAR2(100 BYTE),
CNTRY_NM VARCHAR2(255 BYTE),
POSTAL_CD VARCHAR2(15 BYTE),
POPUL_APROX VARCHAR2(15 BYTE),
LONGITUDE NUMBER,
LATITUDE NUMBER,
GLOBAL_REG VARCHAR2(40 BYTE),
CNTRY_CD_2CHAR VARCHAR2(2 BYTE),
CNTRY_CD_3CHAR VARCHAR2(3 BYTE),
CNTRY_CD_NUMER VARCHAR2(3 BYTE),
CNTRY_NM_OFFICIAL VARCHAR2(255 BYTE),
CNTRY_CALL_CD VARCHAR2(10 BYTE),
BANK_CNTRY_CD NUMBER(38,0),
CONTINENT VARCHAR2(30 BYTE),
MONETARY_REG VARCHAR2(40 BYTE),
EFFECTIVE_DT_FROM DATE,
EFFECTIVE_DT_TO DATE,
VERSION NUMBER(38,0),
UPDATED_BY_ETL_JOB NUMBER(38,0)
);
Insert into TEST_TBL values (1234,'81910','N/A',null,'N/A','N/A','0','N/A','USA','N/A','0','0','0','NA','EN','ENG','001',null,'91','0','North_America',null,to_date('01.01.00 00:00:00','DD.MM.YY HH24:MI:SS'),to_date('31.12.99 23:59:59','DD.MM.YY HH24:MI:SS'),'1',null);
请注意:
唯一索引已分配给LOC_SID(主键),以下列CITY_NM_CLR,CNTRY_CD_2CHAR的非唯一索引已就位
Unique index is already assigned to LOC_SID (primary key) non unique index is already inplace for the following columns CITY_NM_CLR,CNTRY_CD_2CHAR
当前情况
当我从表中检查完整性问题时,我得到了大约300多个提示(以上是同一数据集记录)
when i am checking the integrity issue from the table i am getting many recrods approx 300+ (above is the same data set record)
下面是我用来检查完整性的SQL查询
Below is the sql query i am using to check the integrity
SELECT count(*) AS RowAffected
FROM
(SELECT LOC_SID,
VERSION,
EFFECTIVE_DT_FROM,
EFFECTIVE_DT_TO,
CITY,
POSTAL_CD
FROM
(SELECT t.*,
LEAD(EFFECTIVE_DT_FROM, 1) OVER(PARTITION BY CITY, POSTAL_CD
ORDER BY EFFECTIVE_DT_FROM) AS next_date,
LEAD(VERSION, 1) OVER(PARTITION BY CTY, POSTAL_CD
ORDER BY EFFECTIVE_DT_FROM) AS next_version
FROM TEST_TBL t)
WHERE valid_to != next_date
OR VERSION = next_version)
修复后,当我运行上述查询时,它不应返回任何结果
after the fix when i run the above query it should not return any result
让我概述一下带有样本数据的计数
Let me give a overview of counts with sample data
CITY POSTAL_CD COUNT(*)
N/A N/A 502
TOKYO N/A 30
HABSIGUDA 3452 2
DELHI N/A 4
ASSAM 1153 1
ASSAM 1290 1
ASSAM 1310 1
ASSAM 1781 1
ASSAM 1982 1
样本数据
场景-1
LOC_SID CITY POSTAL_CD EFFECTIVE_DT_FROM EFFECTIVE_DT_TO VERSION
82141 N/A N/A 01.01.1900 31.12.2199 1
82142 N/A N/A 01.01.1900 31.12.2199 1
我有400这样的行
场景-2
LOC_SID CITY POSTAL_CD EFFECTIVE_DT_FROM EFFECTIVE_DT_TO VERSION
46211 TOKYO N/A 01.01.1900 31.12.2199 1
46212 TOKYO N/A 01.01.1900 31.12.2199 1
我有30个此类行
场景-3
LOC_SID CITY POSTAL_CD EFFECTIVE_DT_FROM EFFECTIVE_DT_TO VERSION
57372 HABSIGUDA 3452 01.01.1900 29.12.2017 1
61321 HABSIGUDA 3452 29.12.2017 31.12.2199 2
67371 UPPAL 4922 01.01.1900 31.12.2199 1
80737 HABSIGUDA 3452 01.01.1900 31.12.2199 1
我只有一行
场景-4
LOC_SID CITY POSTAL_CD EFFECTIVE_DT_FROM EFFECTIVE_DT_TO VERSION
4822 DELHI N/A 01.01.1900 31.12.2199 1
4825 DELHI 3911 01.01.1900 31.12.2199 1
4826 DELHI N/A 01.01.1900 31.12.2199 1
4827 DELHI N/A 01.01.1900 31.12.2199 1
4828 DELHI N/A 01.01.1900 31.12.2199 1
4829 DELHI N/A 01.01.1900 31.12.2199 1
我有4行
场景-5
LOC_SID CITY POSTAL_CD EFFECTIVE_DT_FROM EFFECTIVE_DT_TO VERSION
25101 ASSAM 1153 01.01.1900 31.12.2199 1
25102 ASSAM 1153 01.01.1900 31.12.2199 1
25103 ASSAM 1290 01.01.1900 31.12.2199 1
25104 ASSAM 1290 01.01.1900 31.12.2199 1
25105 ASSAM 1310 01.01.1900 31.12.2199 1
25106 ASSAM 1310 01.01.1900 31.12.2199 1
25107 ASSAM 1781 01.01.1900 31.12.2199 1
25108 ASSAM 1781 01.01.1900 31.12.2199 1
25109 ASSAM 1982 01.01.1900 31.12.2199 1
25110 ASSAM 1982 01.01.1900 31.12.2199 1
我有300行
基本上LOC_SID是PK并且是增量的,因此在预期的版本是增量的之后,下面是执行更新语句然后在db表中的预期结果
basically LOC_SID is PK and incremental ,so after the version expected is incremental below is the expected result after the update statement is executed then in the db table
**结果(运行更新语句后)**
** results (after running the update statement )**
场景-4
LOC_SID CITY POSTAL_CD EFFECTIVE_DT_FROM EFFECTIVE_DT_TO VERSION
4822 DELHI N/A 01.01.1900 01.01.1900 1
4825 DELHI 3911 01.01.1900 31.12.2199 1
4826 DELHI N/A 01.01.1900 01.01.1900 2
4827 DELHI N/A 01.01.1900 01.01.1900 3
4828 DELHI N/A 01.01.1900 01.01.1900 4
4829 DELHI N/A 01.01.1900 01.01.1900 5
推荐答案
获取表TEST_TBL
的backup
并尝试以下操作
Take a backup
of your table TEST_TBL
and try the below
执行以下两个UPDATE
直到更新zero rows
Do the following two UPDATE
till you get zero rows
updated
BEGIN
WHILE(1=1)
LOOP
UPDATE TEST_TBL a
set version=(select prev_version+1 from
(
SELECT
t.*,
ROWID,
Lag(version, 1)
over(
PARTITION BY cty, postal_cd
ORDER BY effective_dt_from) AS prev_version
from TEST_TBL T)t
where a.rowid=t.rowid
AND a.version <=t.prev_version)
WHERE EXISTS (select 1 from (
SELECT
t.*,
ROWID,
Lag(version, 1)
over(
PARTITION BY cty, postal_cd
ORDER BY effective_dt_from) AS prev_version
from TEST_TBL T)t
where a.rowid=t.rowid
AND a.version <=t.prev_version);
commit;
EXIT WHEN SQL%ROWCOUNT=0;
END LOOP;
END;
第二次更新
BEGIN
WHILE(1=1)
LOOP
UPDATE TEST_TBL b
set valid_to=(select next_date from
(
SELECT
t.*,
ROWID,
LEAD(EFFECTIVE_DT_FROM, 1) OVER(PARTITION BY CITY, POSTAL_CD
ORDER BY EFFECTIVE_DT_FROM) AS next_date
from test_tbl T)t
where b.rowid=t.rowid
AND valid_to != next_date)
WHERE EXISTS (select 1 from (
SELECT
t.*,
ROWID,
LEAD(EFFECTIVE_DT_FROM, 1) OVER(PARTITION BY CITY, POSTAL_CD
ORDER BY EFFECTIVE_DT_FROM) AS next_date
from test_tbl T)t
where b.rowid=t.rowid
AND valid_to != next_date);
commit;
EXIT WHEN SQL%ROWCOUNT=0;
END LOOP;
END;
这篇关于数据完整性SQL查询可修复表中的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!