读取CSV Oracle存储过程UTL_FILE时,请忽略引号之间的逗号 [英] Ignore commas between quotations when reading CSV Oracle stored procedure UTL_FILE

查看:133
本文介绍了读取CSV Oracle存储过程UTL_FILE时,请忽略引号之间的逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例数据:

1, test, "xy, yz", dog, cat
2, test2, xy, fish, bear

如您在上面看到的那样,第二列可能包含也可能不包含在引号中,并且逗号作为字符串的一部分.

As you can see above, the second column may or may not be enclosed in quotes with a comma as part of the string.

我只是想找到一种忽略引号内逗号的方法(如果有的话). 我知道我知道,但我使用的是UTL_FILE,不能使用外部表,但是我不是可以做出决定的人.

I am simply trying to find a way to ignore the comma within the quotes, if either is there. I am using UTL_FILE and can't use external tables, I know I know, but I am not the one who gets to make decisions.

我还没有真正尝试过"任何东西,因为在这种特殊情况下我还没有找到任何可以尝试的东西.我有提取每个逗号之间的子字符串的代码,但是显然,它将"xy,yz"分割成2个单独的字符串.通过使用作为该特定列的定界符,我已经能够使它在第一行上工作,但是当下一行第二列没有引号时,这将变得很糟糕.

I haven't really "tried" anything yet, because I haven't been able to find anything to try in this particular scenario. I have code that extracts substrings between each comma, but obviously, will split "xy, yz" into 2 separate strings. I have been able to get it to work on ONE row, by using " as the delimiter for that specific column, but that goes to hell, when the next row has no quotes for the second column.

推荐答案

如果您获得的输入文件的所有字段都用双引号引起来(或者可以在阅读后进行翻译;我的正则表达式还不够好),可以使用 dbms_utility.comma_to_table ;字段必须括起来,因为(如在此所述),令牌必须是有效的对象名称,因此1会导致一个错误.例如:

If you can get the input file delivered with all fields enclosed in double-quotes (or can translate it after reading; my regex isn't good enough) you can use dbms_utility.comma_to_table; the fields need to be enclosed because (as noted here) the tokens have to be valid object names, so 1 causes an error. For example:

declare
    file utl_file.file_type;
    list varchar2(120);
    tablen binary_integer;
    tab dbms_utility.lname_array;
begin
    file := utl_file.fopen('MY_DIR', 'test1.csv', 'R');
    loop
        begin
            utl_file.get_line(file => file, buffer => list);
        exception
            when no_data_found then
                exit;
        end;
        dbms_output.put_line('Raw list: ' || list);

        dbms_utility.comma_to_table(list => list,
            tablen => tablen, tab => tab);

        for i in 1..tablen
        loop
            dbms_output.put_line('Column ' || i || ': '
                || replace(tab(i), '"'));
        end loop;
    end loop;
    utl_file.fclose(file);
end;
/

给予:

Raw list: "1","test","xy, yz","dog","cat"
Column 1: 1
Column 2: test
Column 3: xy, yz
Column 4: dog
Column 5: cat
Raw list: "2","test2","xy","fish","bear"
Column 1: 2
Column 2: test2
Column 3: xy
Column 4: fish
Column 5: bear
Raw list: "3","test3","ab, cd","rabbit, rabbit","duck"
Column 1: 3
Column 2: test3
Column 3: ab, cd
Column 4: rabbit, rabbit
Column 5: duck

如果未引用它们,则可以使用正则表达式(来自此处):

If they aren't quoted then you can use a regex (pattern from here):

declare
    file utl_file.file_type;
    list varchar2(120);
    pattern varchar2(15) := '("[^"]*"|[^,]+)';
    c sys_refcursor;
    i number;
    f varchar2(20);
begin
    file := utl_file.fopen('MY_DIR', 'test2.csv', 'R');
    loop
        begin
            utl_file.get_line(file => file, buffer => list);
        exception
            when no_data_found then
                exit;
        end;

        dbms_output.put_line('Raw list: ' || list);

        open c for
             select level as col,
                 regexp_substr(list, pattern, 1, rownum) split  
             from dual
             connect by level <= length(regexp_replace(list, pattern))  + 1;

        loop
            fetch c into i, f;
            exit when c%notfound;
            dbms_output.put_line('Column ' || i || ': ' || replace(f, '"'));
        end loop;
        close c;

    end loop;
    utl_file.fclose(file);
end;
/

给出:

Raw list: 1,test,"xy, yz",dog,cat
Column 1: 1
Column 2: test
Column 3: xy, yz
Column 4: dog
Column 5: cat
Raw list: 2,test2,xy,fish,bear
Column 1: 2
Column 2: test2
Column 3: xy
Column 4: fish
Column 5: bear
Raw list: 3,test3,"ab, cd","rabbit, rabbit",duck
Column 1: 3
Column 2: test3
Column 3: ab, cd
Column 4: rabbit, rabbit
Column 5: duck

我不确定您是否确实在问题中所示的字段之间有空格.如果是这样,第一种方法仍然有效,您可以在tab(i)周围添加trim().第二种方法 发生故障,因此需要进行一些调整...

I'm not sure if you really have spaces between the fields as shown in the question. If so, the first method still works and you can add a trim() around tab(i). The second method breaks down so would need a bit of adjustment...

这篇关于读取CSV Oracle存储过程UTL_FILE时,请忽略引号之间的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆