从字符串中获取序列后从序列中找到缺失的数字? [英] finding missing numbers from sequence after getting sequenuce from a string?

查看:68
本文介绍了从字符串中获取序列后从序列中找到缺失的数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数百万条这样的字符串记录,其中有 310 种类型,它们具有不同的格式,可以从中获取序列、年、月和日.

脚本将获得序列、年、月和日...现在我想要一个 Pl/Sql,它将获得序列的最大值和最小值,并找到缺失的数字,例如年和月14 - 06 怎么样??

解决方案

您根本不想在这里看到 dual;当然不是试图插入.您需要跟踪在循环中迭代时看到的最高和最低值.基于表示日期的 ename 的某些元素,我很确定您希望所有匹配项都是 0-9,而不是 1-9.您在访问其字段时也指的是游标名称,而不是记录变量名称:

 FOR List_ENAME_rec IN List_ENAME_cur 循环如果 REGEXP_LIKE(List_ENAME_rec.ENAME,'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]') 然后V_seq := substr(List_ENAME_rec.ename,5,4);V_Year := substr(List_ENAME_rec.ename,10,2);V_Month := substr(List_ENAME_rec.ename,13,2);V_day := substr(List_ENAME_rec.ename,16,2);如果 min_seq 为空或 V_seq <;min_seq 然后min_seq := v_seq;万一;如果 max_seq 为空或 V_seq >max_seq 然后max_seq := v_seq;万一;万一;结束循环;

使用emp-1111_14_01_01_1111_G1emp-1115_14_02_02_1111_G1 表中的值,报告max_seq 1115 min_seq 1111>

如果你真的想涉及到 dual ,你可以在循环中这样做,而不是 if/then/assign 模式,但这不是必需的:

 选择最少(min_seq, v_seq), 最大(max_seq, v_seq)成 min_seq, max_seq从双重;

我不知道程序要做什么;test1 中的任何内容与您找到的值之间似乎没有任何关系.

尽管如此,您不需要任何 PL/SQL.您可以从一个简单的查询中获取最小值/最大值:

选择 min(to_number(substr(ename, 5, 4))) 作为 min_seq,max(to_number(substr(ename, 5, 4))) 作为 max_seq从表 1其中状态 = 2和 regexp_like(ename,'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')MIN_SEQ MAX_SEQ---------- ----------1111 1115

您可以使用这些来生成该范围内所有值的列表:

with t as (选择 min(to_number(substr(ename, 5, 4))) 作为 min_seq,max(to_number(substr(ename, 5, 4))) 作为 max_seq从表 1其中状态 = 2和 regexp_like(ename,'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]'))选择 min_seq + level - 1 作为 seq从T按级别连接 <= (max_seq - min_seq) + 1;序列号----------11111112111311141115

还有一个稍微不同的公用表表达式,用于查看您的表中不存在哪些,我认为这就是您所追求的:

with t as (选择 to_number(substr(ename, 5, 4)) 作为 seq从表 1其中状态 = 2和 regexp_like(ename,'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')),你作为(选择 min(seq) 作为 min_seq,max(seq) 作为 max_seq从T),v 作为 (选择 min_seq + level - 1 作为 seq从你这来的按级别连接 <= (max_seq - min_seq) + 1)选择 v.seq 作为 missing_seq从 vt.seq = v.seq 上的左连接 t其中 t.seq 为空按 v.seq 排序;MISSING_SEQ-----------111211131114

或者如果您愿意:

<预><代码>...选择 v.seq 作为 missing_seq从 v如果不存在(从 t 中选择 1,其中 t.seq = v.seq)按 v.seq 排序;

SQL 小提琴.

<小时>

根据评论,我认为您需要 ID (YY_MM_DD) 其他元素的每个组合的序列的缺失值.这将为您提供细分:

with t as (选择 to_number(substr(ename, 5, 4)) 作为序列,substr(ename, 10, 2) as yy,substr(ename, 13, 2) 作为毫米,substr(ename, 16, 2) 作为 dd从表 1其中状态 = 2和 regexp_like(ename,'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')),r (yy, mm, dd, seq, max_seq) 为 (选择 yy、mm、dd、min(seq)、max(seq)从T按 yy、mm、dd 分组联合所有选择 yy、mm、dd、seq + 1、max_seq从 r其中 seq + 1 <= max_seq)选择 yy, mm, dd, seq 作为 missing_seq从 r哪里不存在(从 t 中选择 1其中 t.yy = r.yy和 t.mm = r.mm和 t.dd = r.dd和 t.seq = r.seq)按 yy、mm、dd、seq 排序;

输出如下:

YY MM DD MISSING_SEQ---- ---- ---- -------------14 01 01 111214 01 01 111314 01 01 111414 02 02 111814 02 02 112014 02 03 112714 02 03 1128

SQL 小提琴.

如果你想寻找一个特定的日期,你可以冷过滤它(在 t 中,或者在 r 中的第一个分支),但你也可以改变正则表达式包含固定值的模式;所以要查找 14 06 模式将是 'emp[-][0-9]{4}_14_06_[0-9]{2}[_][0-9]{4}[_][G][1]',例如.但这更难概括,因此过滤器 (where t.yy = '14' and t.mm = '06' 可能更灵活.

<小时>

如果您坚持在程序中使用它,您可以将日期元素设为可选并修改正则表达式模式:

创建或替换过程 show_missing_seqs(yy in varchar2 default '[0-9]{2}',varchar2 中的 mm 默认为 '[0-9]{2}',varchar2 中的 dd 默认为 '[0-9]{2}') as模式 varchar2(80);游标 cur(模式 varchar2)是t 为 (选择 to_number(substr(ename, 5, 4)) 作为序列,substr(ename, 10, 2) as yy,substr(ename, 13, 2) 作为毫米,substr(ename, 16, 2) 作为 dd从表 1其中状态 = 2和 regexp_like(ename, pattern)),r (yy, mm, dd, seq, max_seq) 为 (选择 yy、mm、dd、min(seq)、max(seq)从T按 yy、mm、dd 分组联合所有选择 yy、mm、dd、seq + 1、max_seq从 r其中 seq + 1 <= max_seq)选择 yy, mm, dd, seq 作为 missing_seq从 r哪里不存在(从 t 中选择 1其中 t.yy = r.yy和 t.mm = r.mm和 t.dd = r.dd和 t.seq = r.seq)按 yy、mm、dd、seq 排序;开始模式:= 'emp[-][0-9]{4}[_]'||y||'[_]' ||毫米 ||'[_]' ||日||'[_][0-9]{4}[_][G][1]';for rec in cur(pattern) 循环dbms_output.put_line(to_char(rec.missing_seq,'FM0000'));结束循环;结束 show_missing_seqs;/

我不知道您为什么坚持必须这样做,或者您为什么要使用 dbms_output,因为您依赖于显示该内容的客户端/调用者;你的工作将如何处理输出?你可以让它返回一个 sys_refcursor ,这会更灵活.但无论如何,您可以在 SQL*Plus/SQL Developer 中这样称呼它:

设置 serveroutput onexec show_missing_seqs(yy => '14', mm => '01');匿名块完成111211131114

I have a millions of string record like this one with 310 types of them that have different format to get sequence,year,month and day from..

the script will get the sequence,year,month and day... now I want a Pl/Sql that will get the max and min value number of the sequence and find the missing number where is year and month are for example 14 - 06 how ??

解决方案

You don't want to be looking at dual at all here; certainly not attempting to insert. You need to track the highest and lowest values you've seen as you iterate through the loop. based on some of the elements of ename representing dates I'm pretty sure you want all your matches to be 0-9, not 1-9. You're also referring to the cursor name as you access its fields, instead of the record variable name:

  FOR List_ENAME_rec IN List_ENAME_cur loop
    if REGEXP_LIKE(List_ENAME_rec.ENAME,'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]') then 
      V_seq := substr(List_ENAME_rec.ename,5,4);
      V_Year := substr(List_ENAME_rec.ename,10,2);
      V_Month := substr(List_ENAME_rec.ename,13,2);
      V_day := substr(List_ENAME_rec.ename,16,2);

      if min_seq is null or V_seq < min_seq then
        min_seq := v_seq;
      end if;
      if max_seq is null or V_seq > max_seq then
        max_seq := v_seq;
      end if;

    end if;
  end loop;

With values in the table of emp-1111_14_01_01_1111_G1 and emp-1115_14_02_02_1111_G1, that reports max_seq 1115 min_seq 1111.

If you really wanted to involve dual you could do this inside the loop, instead of the if/then/assign pattern, but it's not necessary:

      select least(min_seq, v_seq), greatest(max_seq, v_seq)
      into min_seq, max_seq
      from dual;

I have no idea what the procedure is going to do; there seems to be no relationship between whatever you've got in test1 and the values you're finding.

You don't need any PL/SQL for this though. You can get the min/max values from a simple query:

select min(to_number(substr(ename, 5, 4))) as min_seq,
  max(to_number(substr(ename, 5, 4))) as max_seq
from table1
where status = 2
and regexp_like(ename,
  'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')

   MIN_SEQ    MAX_SEQ
---------- ----------
      1111       1115 

And you can use those to generate a list of all values in that range:

with t as (
  select min(to_number(substr(ename, 5, 4))) as min_seq,
    max(to_number(substr(ename, 5, 4))) as max_seq
  from table1
  where status = 2
  and regexp_like(ename,
    'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
)
select min_seq + level - 1 as seq
from t
connect by level <= (max_seq - min_seq) + 1;

       SEQ
----------
      1111 
      1112 
      1113 
      1114 
      1115 

And a slightly different common table expression to see which of those don't exist in your table, which I think is what you're after:

with t as (
  select to_number(substr(ename, 5, 4)) as seq
  from table1
  where status = 2
  and regexp_like(ename,
    'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
),
u as (
  select min(seq) as min_seq,
    max(seq) as max_seq
  from t
),
v as (
  select min_seq + level - 1 as seq
  from u
  connect by level <= (max_seq - min_seq) + 1
)
select v.seq as missing_seq
from v
left join t on t.seq = v.seq
where t.seq is null
order by v.seq;

MISSING_SEQ
-----------
       1112 
       1113 
       1114 

or if you prefer:

...
select v.seq as missing_seq
from v
where not exists (select 1 from t where t.seq = v.seq)
order by v.seq;

SQL Fiddle.


Based on comments I think you want the missing values for the sequence for each combination of the other elements of the ID (YY_MM_DD). This will give you that breakdown:

with t as (
  select to_number(substr(ename, 5, 4)) as seq,
    substr(ename, 10, 2) as yy,
    substr(ename, 13, 2) as mm,
    substr(ename, 16, 2) as dd
  from table1
  where status = 2
  and regexp_like(ename,
    'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
),
r (yy, mm, dd, seq, max_seq) as (
  select yy, mm, dd, min(seq), max(seq)
  from t
  group by yy, mm, dd
  union all
  select yy, mm, dd, seq + 1, max_seq
  from r
  where seq + 1 <= max_seq
)
select yy, mm, dd, seq as missing_seq
from r
where not exists (
  select 1 from t
  where t.yy = r.yy
  and t.mm = r.mm
  and t.dd = r.dd
  and t.seq = r.seq
)
order by yy, mm, dd, seq;

With output like:

YY   MM   DD    MISSING_SEQ 
---- ---- ---- -------------
14   01   01            1112 
14   01   01            1113 
14   01   01            1114 
14   02   02            1118 
14   02   02            1120 
14   02   03            1127 
14   02   03            1128 

SQL Fiddle.

If you want to look for a particular date you cold filter that (either in t, or the first branch in r), but you could also change the regex pattern to include the fixed values; so to look for 14 06 the pattern would be 'emp[-][0-9]{4}_14_06_[0-9]{2}[_][0-9]{4}[_][G][1]', for example. That's harder to generalise though, so a filter (where t.yy = '14' and t.mm = '06' might be more flexible.


If you insist in having this in a procedure, you can make the date elements optional and modify the regex pattern:

create or replace procedure show_missing_seqs(yy in varchar2 default '[0-9]{2}',
  mm in varchar2 default '[0-9]{2}', dd in varchar2 default '[0-9]{2}') as

  pattern varchar2(80);
  cursor cur (pattern varchar2) is
    with t as (
      select to_number(substr(ename, 5, 4)) as seq,
        substr(ename, 10, 2) as yy,
        substr(ename, 13, 2) as mm,
        substr(ename, 16, 2) as dd
      from table1
      where status = 2
      and regexp_like(ename, pattern)
    ),
    r (yy, mm, dd, seq, max_seq) as (
      select yy, mm, dd, min(seq), max(seq)
      from t
      group by yy, mm, dd
      union all
      select yy, mm, dd, seq + 1, max_seq
      from r
      where seq + 1 <= max_seq
    )
    select yy, mm, dd, seq as missing_seq
    from r
    where not exists (
      select 1 from t
      where t.yy = r.yy
      and t.mm = r.mm
      and t.dd = r.dd
      and t.seq = r.seq
    )
    order by yy, mm, dd, seq;
begin
  pattern := 'emp[-][0-9]{4}[_]'
    || yy || '[_]' || mm || '[_]' || dd
    || '[_][0-9]{4}[_][G][1]';
  for rec in cur(pattern) loop
    dbms_output.put_line(to_char(rec.missing_seq, 'FM0000'));
  end loop;
end show_missing_seqs;
/

I don't know why you insist it has to be done like this or why you want to use dbms_output as you're relying on the client/caller displaying that; what will your job do with the output? You could make this return a sys_refcursor which would be more flexible. but anyway, you can call it like this from SQL*Plus/SQL Developer:

set serveroutput on
exec show_missing_seqs(yy => '14', mm => '01');

anonymous block completed
1112
1113
1114

这篇关于从字符串中获取序列后从序列中找到缺失的数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆