读取CSV Oracle存储过程UTL_FILE时,请忽略引号之间的逗号 [英] Ignore commas between quotations when reading CSV Oracle stored procedure UTL_FILE
问题描述
示例数据:
1, test, "xy, yz", dog, cat
2, test2, xy, fish, bear
如您在上面看到的那样,第二列可能包含也可能不包含在引号中,并且逗号作为字符串的一部分.
As you can see above, the second column may or may not be enclosed in quotes with a comma as part of the string.
我只是想找到一种忽略引号内逗号的方法(如果有的话). 我知道我知道,但我使用的是UTL_FILE,不能使用外部表,但是我不是可以做出决定的人.
I am simply trying to find a way to ignore the comma within the quotes, if either is there. I am using UTL_FILE and can't use external tables, I know I know, but I am not the one who gets to make decisions.
我还没有真正尝试过"任何东西,因为在这种特殊情况下我还没有找到任何可以尝试的东西.我有提取每个逗号之间的子字符串的代码,但是显然,它将"xy,yz"分割成2个单独的字符串.通过使用作为该特定列的定界符,我已经能够使它在第一行上工作,但是当下一行第二列没有引号时,这将变得很糟糕.
I haven't really "tried" anything yet, because I haven't been able to find anything to try in this particular scenario. I have code that extracts substrings between each comma, but obviously, will split "xy, yz" into 2 separate strings. I have been able to get it to work on ONE row, by using " as the delimiter for that specific column, but that goes to hell, when the next row has no quotes for the second column.
推荐答案
如果您获得的输入文件的所有字段都用双引号引起来(或者可以在阅读后进行翻译;我的正则表达式还不够好),可以使用 dbms_utility.comma_to_table
;字段必须括起来,因为(如在此所述),令牌必须是有效的对象名称,因此1
会导致一个错误.例如:
If you can get the input file delivered with all fields enclosed in double-quotes (or can translate it after reading; my regex isn't good enough) you can use dbms_utility.comma_to_table
; the fields need to be enclosed because (as noted here) the tokens have to be valid object names, so 1
causes an error. For example:
declare
file utl_file.file_type;
list varchar2(120);
tablen binary_integer;
tab dbms_utility.lname_array;
begin
file := utl_file.fopen('MY_DIR', 'test1.csv', 'R');
loop
begin
utl_file.get_line(file => file, buffer => list);
exception
when no_data_found then
exit;
end;
dbms_output.put_line('Raw list: ' || list);
dbms_utility.comma_to_table(list => list,
tablen => tablen, tab => tab);
for i in 1..tablen
loop
dbms_output.put_line('Column ' || i || ': '
|| replace(tab(i), '"'));
end loop;
end loop;
utl_file.fclose(file);
end;
/
给予:
Raw list: "1","test","xy, yz","dog","cat"
Column 1: 1
Column 2: test
Column 3: xy, yz
Column 4: dog
Column 5: cat
Raw list: "2","test2","xy","fish","bear"
Column 1: 2
Column 2: test2
Column 3: xy
Column 4: fish
Column 5: bear
Raw list: "3","test3","ab, cd","rabbit, rabbit","duck"
Column 1: 3
Column 2: test3
Column 3: ab, cd
Column 4: rabbit, rabbit
Column 5: duck
如果未引用它们,则可以使用正则表达式(来自此处):
If they aren't quoted then you can use a regex (pattern from here):
declare
file utl_file.file_type;
list varchar2(120);
pattern varchar2(15) := '("[^"]*"|[^,]+)';
c sys_refcursor;
i number;
f varchar2(20);
begin
file := utl_file.fopen('MY_DIR', 'test2.csv', 'R');
loop
begin
utl_file.get_line(file => file, buffer => list);
exception
when no_data_found then
exit;
end;
dbms_output.put_line('Raw list: ' || list);
open c for
select level as col,
regexp_substr(list, pattern, 1, rownum) split
from dual
connect by level <= length(regexp_replace(list, pattern)) + 1;
loop
fetch c into i, f;
exit when c%notfound;
dbms_output.put_line('Column ' || i || ': ' || replace(f, '"'));
end loop;
close c;
end loop;
utl_file.fclose(file);
end;
/
给出:
Raw list: 1,test,"xy, yz",dog,cat
Column 1: 1
Column 2: test
Column 3: xy, yz
Column 4: dog
Column 5: cat
Raw list: 2,test2,xy,fish,bear
Column 1: 2
Column 2: test2
Column 3: xy
Column 4: fish
Column 5: bear
Raw list: 3,test3,"ab, cd","rabbit, rabbit",duck
Column 1: 3
Column 2: test3
Column 3: ab, cd
Column 4: rabbit, rabbit
Column 5: duck
我不确定您是否确实在问题中所示的字段之间有空格.如果是这样,第一种方法仍然有效,您可以在tab(i)
周围添加trim()
.第二种方法
发生故障,因此需要进行一些调整...
I'm not sure if you really have spaces between the fields as shown in the question. If so, the first method still works and you can add a trim()
around tab(i)
. The second method
breaks down so would need a bit of adjustment...
这篇关于读取CSV Oracle存储过程UTL_FILE时,请忽略引号之间的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!