以给定格式创建从文件名中提取的信息的 CSV [英] Creating CSV of information extracted from filenames in a given format
问题描述
我有一个小脚本,它列出了目录和所有子目录中所有文件的路径,并使用 Perl 中的正则表达式解析列表中的每个路径.
I have a little script that lists paths to all files in a directory and all subdirectories and parses each path on the list with regex in Perl.
#!/bin/sh
find * -type f | while read j; do
echo $j | perl -n -e '/\/(\d{2})\/(\d{2})\/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?/ && print "\"0\";\"$1$2$3\";\"$4\";\"$5\";$fl\""' >> bss.csv
echo | readlink -f -n "$j" >>bss.csv
echo \">>bss.csv
done
输出:
"0";"13957";"4121113";"2";"/home/root/dir1/bss/164146/13/95/7___/000240216___Abc-4121113_2.jpg"
我正在使用来自 GNU coreutils 的 readlink
:-n
在结尾处取消换行,-f
通过递归跟踪符号链接来执行规范化路径.
I am using the readlink
from GNU coreutils: -n
suppresses newline at the end, -f
performs canonicalization by recursively following symlinks on the path.
问题是,当输入字符串没有通过正则表达式时,我只有文件路径行.
Problem is, when input string did not pass regex I have only line with file path.
如何添加条件来检查正则表达式是否通过 - 显示路径,否则 - 否.我用各种组合伤脑筋,但没有找到任何可以正常工作的组合.
How can I add condition to check if regex passed - show path, else - no. I broke my brain with various combinations, but didn't find any that work properly.
推荐答案
解决方案描述
在 Perl 中,使用 if (/…/) {…} else {…}
而不是 /…/&&...
.因此,如果匹配成功,您可以执行打印,否则您可以执行其他一些代码.
Description of solution
In Perl, use if (/…/) {…} else {…}
instead of /…/ && …
. Thus you can execute print if match is successful and some other code otherwise.
如果这不是问题并且您只想摆脱 readlink
输出和结束引号,您可以使用反引号从 Perl 调用 readlink
.
If this is not the problem and you only want to get rid of the readlink
output and closing quote, you can call readlink
from Perl using backticks.
我把所有东西都变成了一个 Perl 程序,使用了 File::Find
而不是 find
命令,假设 $fl
在最后Perl 中的 print
是一个遗物(忽略它)并使用 Cwd::realpath()
来查找文件的规范路径而不是 readlink -f
来自 GNU coreutils.如果你还想使用readlink -f
,可以把Cwd::realpath($_)
改成`readlink -f '$_'`
(包括反引号!),但它不适用于包含单引号的文件名.
I turned everything into a single Perl program, used File::Find
instead of find
command, assumed $fl
at the end of print
in Perl is a relict (ignored it) and used Cwd::realpath()
to find canonical path of the file instead of readlink -f
from GNU coreutils. If you still want to use readlink -f
, feel free to change Cwd::realpath($_)
to `readlink -f '$_'`
(including the backticks!), but then it will not work for filenames containing a single-quote.
您应该将此脚本称为 ./script-name starting-directory >bss.csv
.如果你把它放在你正在检查的目录中,输出也会包含它,以及 bss.csv
.
You should call this script as ./script-name starting-directory > bss.csv
. If you put it in the directory you are examining, the output would contain it too, along with the bss.csv
.
#!/usr/bin/perl
# Usage: ./$0 [<starting-directory>...]
use strict;
use warnings;
use File::Find;
use Cwd;
no warnings 'File::Find';
sub handleFile() {
return if not -f;
if ($File::Find::name =~ /\/(\d{2})\/(\d{2})\/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?/) {
local $, = ';', $\ = "\n";
print map "\"$_\"", 0, $1.$2.$3, $4, $5, Cwd::realpath($_);
} else {
print STDERR "File $File::Find::name did not match\n";
}
}
find(\&handleFile, @ARGV ? @ARGV : '.');
作为参考,我还附上了原始程序的改进版本.正如我上面建议的那样,它从 Perl 调用 readlink
并且真正利用了 Perl 的 -n
选项,避免了 while read
循环.
For reference I also enclose polished version of the original program. It is calling readlink
from Perl as I suggested above and really utilizes the -n
option of Perl, avoiding the while read
loop.
#!/bin/sh
find . -type f | perl -n -e 'm{/(\d{2})/(\d{2})/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?} && print qq{"0";"$1$2$3";"$4";"$5";"`readlink -f -n '\''$_'\''`"}' > bss.csv
对原代码的其他说明
readlink
之前的echo |
什么也不做,应该删除.Readlink 不读取其标准输入.- Perl 中
print
末尾的$fl
是从哪里来的?我认为这是一个遗物. - 使用诸如
qq{}
之类的通用引号和周到地使用分隔符(例如在正则表达式匹配和其他类似引号的运算符中)可以使您免于引用地狱.我已经在上面使用过这个技巧:/…/
→m{…}
和"…"
→qq{…}
.谢谢,Slade! 参见 perlop 联机帮助页了解更多信息. - The
echo |
before thereadlink
does nothing and should be removed. Readlink does not read its stdin. - Where does
$fl
at the end ofprint
in Perl come from? I assume it is a relict. - Use of generic quotes like
qq{}
and thoughtful use of delimiters (e.g. in regex matching and other quote-like operators) can save you from quoting hell. I already used this tip above:/…/
→m{…}
and"…"
→qq{…}
. Thx, Slade! See perlop manpage for more info.
Other remarks to the original code
这篇关于以给定格式创建从文件名中提取的信息的 CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!