获取外部表的位置和文件名? [英] Getting an external table's location and file name?

查看:155
本文介绍了获取外部表的位置和文件名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将多个文件作为外部表的一部分进行处理.有什么办法可以让我在外部表中处理文件名并将其放置在表中?

I'm processing multiple files as part of an external table. Is there any way I can get the file name being processed in external tables and place it in the table?

目前,我能找到的唯一解决方案是将文件名附加到平面文件中的每个记录,这从效率的角度来看并不理想,并且涉及修改原始数据.当然,外部表可以随时知道正在处理哪个文件吗?

At the moment the only solution I can find is appending the file name to every record in the flat file which isn't ideal from an efficiency standpoint and involves modifying the original data. Surely external tables know what file is being processed at any time?

推荐答案

我不知道有什么方法可以在访问参数中捕获文件名.解决方法是,可以使用预处理器动态添加文件名,而不是修改原始文件.如果您有两个文件,例如包含a,b,1file_1.csv和包含c,d,2file_2.csv,则可能有一个小的外壳脚本,例如append_filename.sh:

I'm not aware of any way to capture the file name within the access parameters. As a workaround, rather than modifying the original files you could use a preprocessor to append the file name on the fly. If you had two files, say file_1.csv containing a,b,1 and file_2.csv containing c,d,2, you could have a small shell script like append_filename.sh:

#!/bin/bash
while read line
do
  printf "%s,%s\n" "${line}" "${1##*/}"
done < $1

您可以通过直接调用脚本来验证它的作用:

which you can verify does something helpful by calling the script directly:

$ ./append_filename.sh file_1.csv
a,b,1,file_1.csv

然后,您可以定义您的外部表,以通过 preprocessor子句,类似于:

You can then define your external table to call that via the preprocessor clause, something like:

create table e42 (
  col1 varchar2(10),
  col2 varchar2(10),
  col3 number,
  filename varchar2(30)
)
organization external (
  type oracle_loader
  default directory d42
  access parameters (
    records delimited by newline
    preprocessor 'append_filename.sh'
    fields terminated by ','
  )
  location ('file_1.csv', 'file_2.csv')
);

Table E42 created.

然后自动获取文件名:

select * from e42;

COL1       COL2             COL3 FILENAME                     
---------- ---------- ---------- ------------------------------
a          b                   1 file_1.csv                    
c          d                   2 file_2.csv                    

我已经剥离了目录路径,所以您只能看到文件名-如果愿意,可以保留完整路径,但这可能不是必需的,并且可以向只能查询表的人员显示OS详细信息.注意安全准则;我在这里通过使用一个目录来保持所有内容的简单性,但是您应该将预处理器放在其他位置.当然,这是假设使用Unix-y平台或GNU工具.如果您使用的是Windows,则批处理文件应该可以实现类似的操作.

I've stripped the directory path so you only see the file name - you could keep the full path if you prefer, but that may not be necessary and could reveal OS details to people who can only query the table. Note the security guidelines; I've kept it simple here by using one directory for everything, but you should put the preprocessor somewhere else. And of course this is assuming a Unix-y platform or GNU tools; something similar should be possible with a batch file if you're using Windows.

对于大文件,这种逐行读取的方法相对较慢;在我的平台上,用150万行的测试文件附加了文件名大约花了80秒.其他内置工具将更快.对于同一文件,使用sed的该版本仅花费一秒钟的时间:

This approach reading line-by-line will be relatively slow for large files; with a 1.5 million-row test file appending the file name took about 80 seconds on my platform. Other built-in tools will be faster; this version with sed takes just over a second for the same file:

#!/bin/bash
sed -e 's!$!,'"${1##*/}"'!' $1

您也可以尝试其他替代方法,例如awk;您可能需要进行一些测试,以了解哪种方法在您的环境中最有效(或足够快).

You could try other alternative such as awk too; you'd probably need to test a few to see what works best (or fast enough) in your environment.

这篇关于获取外部表的位置和文件名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆