以给定格式创建从文件名中提取的信息的 CSV [英] Creating CSV of information extracted from filenames in a given format

查看:29
本文介绍了以给定格式创建从文件名中提取的信息的 CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个小脚本,它列出了目录和所有子目录中所有文件的路径,并使用 Perl 中的正则表达式解析列表中的每个路径.

I have a little script that lists paths to all files in a directory and all subdirectories and parses each path on the list with regex in Perl.

#!/bin/sh
find * -type f | while read j; do
echo $j | perl -n -e '/\/(\d{2})\/(\d{2})\/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?/ && print "\"0\";\"$1$2$3\";\"$4\";\"$5\";$fl\""' >> bss.csv
echo | readlink -f -n "$j" >>bss.csv
echo \">>bss.csv
done

输出:

"0";"13957";"4121113";"2";"/home/root/dir1/bss/164146/13/95/7___/000240216___Abc-4121113_2.jpg"

我正在使用来自 GNU coreutils 的 readlink:-n 在结尾处取消换行,-f 通过递归跟踪符号链接来执行规范化路径.

I am using the readlink from GNU coreutils: -n suppresses newline at the end, -f performs canonicalization by recursively following symlinks on the path.

问题是,当输入字符串没有通过正则表达式时,我只有文件路径行.

Problem is, when input string did not pass regex I have only line with file path.

如何添加条件来检查正则表达式是否通过 - 显示路径,否则 - 否.我用各种组合伤脑筋,但没有找到任何可以正常工作的组合.

How can I add condition to check if regex passed - show path, else - no. I broke my brain with various combinations, but didn't find any that work properly.

推荐答案

解决方案描述

在 Perl 中,使用 if (/…/) {…} else {…} 而不是 /…/&&....因此,如果匹配成功,您可以执行打印,否则您可以执行其他一些代码.

Description of solution

In Perl, use if (/…/) {…} else {…} instead of /…/ && …. Thus you can execute print if match is successful and some other code otherwise.

如果这不是问题并且您只想摆脱 readlink 输出和结束引号,您可以使用反引号从 Perl 调用 readlink.

If this is not the problem and you only want to get rid of the readlink output and closing quote, you can call readlink from Perl using backticks.

我把所有东西都变成了一个 Perl 程序,使用了 File::Find 而不是 find 命令,假设 $fl 在最后Perl 中的 print 是一个遗物(忽略它)并使用 Cwd::realpath() 来查找文件的规范路径而不是 readlink -f 来自 GNU coreutils.如果你还想使用readlink -f,可以把Cwd::realpath($_)改成`readlink -f '$_'`(包括反引号!),但它不适用于包含单引号的文件名.

I turned everything into a single Perl program, used File::Find instead of find command, assumed $fl at the end of print in Perl is a relict (ignored it) and used Cwd::realpath() to find canonical path of the file instead of readlink -f from GNU coreutils. If you still want to use readlink -f, feel free to change Cwd::realpath($_) to `readlink -f '$_'` (including the backticks!), but then it will not work for filenames containing a single-quote.

您应该将此脚本称为 ./script-name starting-directory >bss.csv.如果你把它放在你正在检查的目录中,输出也会包含它,以及 bss.csv.

You should call this script as ./script-name starting-directory > bss.csv. If you put it in the directory you are examining, the output would contain it too, along with the bss.csv.

#!/usr/bin/perl
# Usage: ./$0 [<starting-directory>...]
use strict;
use warnings;
use File::Find;
use Cwd;
no warnings 'File::Find';

sub handleFile() {
    return if not -f;
    if ($File::Find::name =~ /\/(\d{2})\/(\d{2})\/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?/) {
        local $, = ';', $\ = "\n";
        print map "\"$_\"", 0, $1.$2.$3, $4, $5, Cwd::realpath($_);
    } else {
        print STDERR "File $File::Find::name did not match\n";
    }
}

find(\&handleFile, @ARGV ? @ARGV : '.');

作为参考,我还附上了原始程序的改进版本.正如我上面建议的那样,它从 Perl 调用 readlink 并且真正利用了 Perl 的 -n 选项,避免了 while read 循环.

For reference I also enclose polished version of the original program. It is calling readlink from Perl as I suggested above and really utilizes the -n option of Perl, avoiding the while read loop.

#!/bin/sh
find . -type f | perl -n -e 'm{/(\d{2})/(\d{2})/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?} && print qq{"0";"$1$2$3";"$4";"$5";"`readlink -f -n '\''$_'\''`"}' > bss.csv

对原代码的其他说明

  • readlink 之前的 echo | 什么也不做,应该删除.Readlink 不读取其标准输入.
  • Perl 中print 末尾的$fl 是从哪里来的?我认为这是一个遗物.
  • 使用诸如 qq{} 之类的通用引号和周到地使用分隔符(例如在正则表达式匹配和其他类似引号的运算符中)可以使您免于引用地狱.我已经在上面使用过这个技巧:/…/m{…}"…"qq{…}.谢谢,Slade 参见 perlop 联机帮助页了解更多信息.
  • Other remarks to the original code

    • The echo | before the readlink does nothing and should be removed. Readlink does not read its stdin.
    • Where does $fl at the end of print in Perl come from? I assume it is a relict.
    • Use of generic quotes like qq{} and thoughtful use of delimiters (e.g. in regex matching and other quote-like operators) can save you from quoting hell. I already used this tip above: /…/m{…} and "…"qq{…}. Thx, Slade! See perlop manpage for more info.
    • 这篇关于以给定格式创建从文件名中提取的信息的 CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆