筛选名称为时间戳的文件夹-使用find实用工具进行模式匹配与正则表达式匹配 [英] Filter folders whose name is a timestamp - pattern matching vs. regex matching using the find utility

查看:83
本文介绍了筛选名称为时间戳的文件夹-使用find实用工具进行模式匹配与正则表达式匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个通用的shell脚本,该脚本根据给定的正则表达式过滤掉文件.

I am writing a generic shell script which filters out files based on given regex.

我的shell脚本:

files=$(find $path -name $regex)

在一种情况下(要过滤),我想过滤目录中的文件夹,文件夹名称采用以下格式:

In one of the cases (to filter), I want to filter folders inside a directory, the name of the folders are in the below format:

20161128-20:34:33:432813246
YYYYMMDD-HH:MM:SS:NS

我无法找到正确的正则表达式.

I am unable to arrive at the correct regex.

我可以使用正则表达式'*data.txt'获取文件夹中文件的路径,因为我知道其中的文件名.

I am able to get the path of the files inside the folder using the regex '*data.txt', as I know the name of the file inside it.

但是它为我提供了文件的完整路径,例如

But it gives me the full path of the file, something like

/path/20161128-20:34:33:432813246/data.txt

我想要的只是:

/path/20161128-20:34:33:432813246

请帮助我确定符合我要求的正确正则表达式

Please help me in identifying the correct regex for my requirement

注意:

我知道以后如何处理数据

I know how to process the data after

files=$(find $path -name $regex)

但是由于脚本在许多用例中都需要通用,因此我只需要正确的正则表达式即可.

But since the script needs to be generic for many use cases, I only need the correct regex that needs to be passed.

推荐答案

  • 根据POSIX find -name -path主要(测试)使用 patterns (又称通配符表达式,glob)来匹配文件名和路径名(尽管模式和正则表达式之间有着遥远的联系,但它们的语法和功能却大不相同;简而言之:模式在语法上更简单,但功能远不如此.

    • Per POSIX, find's -name -path primaries (tests) use patterns (a.k.a wildcard expressions, globs) to match filenames and pathnames (while patterns and regular expressions are distantly related, their syntax and capabilities differ significantly; in short: patterns are syntactically simpler, but far less powerful).

      • -name,并将模式与仅输入路径的 basename (仅文件名)部分匹配
      • -path将模式与整个 pathname (完整路径)匹配
      • -name and matches the pattern against the basename (mere filename) part of an input path only
      • -path matches the pattern against the whole pathname (the full path)

      GNU和BSD/macOS find都实现了非标准扩展:

      Both GNU and BSD/macOS find implement nonstandard extensions:

      • -iname-ipath与它们的标准兼容版本(基于 patterns )相似,但区别在于不区分大小写.
      • >
      • -regex-iregex通过 regex (正则表达式)测试匹配的路径名.
        • 注意事项:两种实现都提供至少2种正则表达式方言供您选择(-E激活对BSD find extended 正则表达式的支持,而GNU find允许从以下几种方言中进行选择: -regextype,但是在两种实现方式中,没有两种方言是完全相同的-请参见底部的血腥细节.
        • -iname and -ipath, which work like their standard-compliant counterparts (based on patterns), except that they match case-insensitively.
        • -regex and -iregex tests for matching pathnames by regex (regular expression).
          • Caveat: Both implementations offer at least 2 regex dialects to choose from (-E activates support for extended regular expressions in BSD find, and GNU find allows selecting from several dialects with-regextype, but no two dialects are exactly the same across the two implementations - see bottom for the gory details.

          使用固定宽度命名方案的文件夹名称后,模式将起作用:

          With your folder names following a fixed-width naming scheme, a pattern would work:

          pattern='[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
          

          当然,如果您不希望误报,可以采取捷径:

          Of course, you can take a shortcut if you don't expect false positives:

          pattern='[0-9]*-[0-9]?:[0-9]?:[0-9]?:[0-9]*'
          

          请注意,与正则表达式不同,*?如何不是引用前一个表达式的重复符号(量词),而是本身表示任何字符序列(*)或任何单个字符(?).

          Note how * and ?, unlike in a regex, are not duplication symbols (quantifiers) that refer to the preceding expression, but by themselves represent any sequence of characters (*) or any single character (?).

          如果我们将它们放在一起:

          If we put it all together:

          files=$(find "$path" -type d -name "$pattern")
          

          • 对变量引用进行双引号很重要,以保护其值免受不必要的shell扩展的影响,尤其是保留路径中的任何空格并防止shell过早地泛化 的值为$pattern.

            • It's important to double-quote the variable references to protect their values from unwanted shell expansions, notably to preserve any whitespace in the path and to prevent premature globbing by the shell of value $pattern.

              请注意,我添加了-type d来限制与目录(文件夹)的匹配,从而提高了性能.

              Note that I've added -type d to limit matching to directories (folders), which improves performance.

              可选背景信息:

              以下是 regex功能矩阵,从macOS 10.12.1上的GNU find v4.6.0/BSD find开始:

              Below is a regex feature matrix as of GNU find v4.6.0 / BSD find as found on macOS 10.12.1:

              • GNU find功能按-regextype选项支持的类型列出,默认值为emacs.

              • GNU find features are listed by the types supported by the -regextype option, with emacs being the default.

              • 请注意,几种posix-*命名的正则表达式类型是错误的名词,因为它们支持POSIX要求的功能 .
              • Note that several posix-*-named regex types are misnomers in that they support features beyond what POSIX mandates.

              BSD find功能由basic列出(使用NO regex选项,这意味着平台风格的

              BSD find features are listed by basic (using NO regex option, which implies platform-flavored BREs) and extended (using option -E, which implies platform-flavored EREs).

              对于跨平台使用,请遵循 POSIX ERE( extended 正则表达式),同时在 GNU find中使用-regextype posix-extended并在 BSD find中使用-E是安全的,但请注意,并非您可能期望的所有功能都受支持,尤其是\b\</\>和字符类快捷方式(例如\d).

              For cross-platform use, sticking with POSIX EREs (extended regular expressions) while using -regextype posix-extended with GNU find and using -E with BSD find is safe, but note that not all features you may expect will be supported, notably \b, \</\> and character class shortcuts such as \d.

              =================== GNU find ===================
              == REGEX FEATURE: \{\}
              TYPE: awk:                                        -
              TYPE: egrep:                                      -
              TYPE: ed:                                         ✓
              TYPE: emacs:                                      -
              TYPE: gnu-awk:                                    -
              TYPE: grep:                                       ✓
              TYPE: posix-awk:                                  -
              TYPE: posix-basic:                                ✓
              TYPE: posix-egrep:                                -
              TYPE: posix-extended:                             -
              TYPE: posix-minimal-basic:                        ✓
              TYPE: sed:                                        ✓
              == REGEX FEATURE: {}
              TYPE: awk:                                        -
              TYPE: egrep:                                      ✓
              TYPE: ed:                                         -
              TYPE: emacs:                                      -
              TYPE: gnu-awk:                                    ✓
              TYPE: grep:                                       -
              TYPE: posix-awk:                                  ✓
              TYPE: posix-basic:                                -
              TYPE: posix-egrep:                                ✓
              TYPE: posix-extended:                             ✓
              TYPE: posix-minimal-basic:                        -
              TYPE: sed:                                        -
              == REGEX FEATURE: \+
              TYPE: awk:                                        -
              TYPE: egrep:                                      -
              TYPE: ed:                                         ✓
              TYPE: emacs:                                      -
              TYPE: gnu-awk:                                    -
              TYPE: grep:                                       ✓
              TYPE: posix-awk:                                  -
              TYPE: posix-basic:                                ✓
              TYPE: posix-egrep:                                -
              TYPE: posix-extended:                             -
              TYPE: posix-minimal-basic:                        -
              TYPE: sed:                                        ✓
              == REGEX FEATURE: +
              TYPE: awk:                                        ✓
              TYPE: egrep:                                      ✓
              TYPE: ed:                                         -
              TYPE: emacs:                                      ✓
              TYPE: gnu-awk:                                    ✓
              TYPE: grep:                                       -
              TYPE: posix-awk:                                  ✓
              TYPE: posix-basic:                                -
              TYPE: posix-egrep:                                ✓
              TYPE: posix-extended:                             ✓
              TYPE: posix-minimal-basic:                        -
              TYPE: sed:                                        -
              == REGEX FEATURE: \b
              TYPE: awk:                                        -
              TYPE: egrep:                                      ✓
              TYPE: ed:                                         ✓
              TYPE: emacs:                                      ✓
              TYPE: gnu-awk:                                    ✓
              TYPE: grep:                                       ✓
              TYPE: posix-awk:                                  -
              TYPE: posix-basic:                                ✓
              TYPE: posix-egrep:                                ✓
              TYPE: posix-extended:                             ✓
              TYPE: posix-minimal-basic:                        ✓
              TYPE: sed:                                        ✓
              == REGEX FEATURE: \< \>
              TYPE: awk:                                        -
              TYPE: egrep:                                      ✓
              TYPE: ed:                                         ✓
              TYPE: emacs:                                      ✓
              TYPE: gnu-awk:                                    ✓
              TYPE: grep:                                       ✓
              TYPE: posix-awk:                                  -
              TYPE: posix-basic:                                ✓
              TYPE: posix-egrep:                                ✓
              TYPE: posix-extended:                             ✓
              TYPE: posix-minimal-basic:                        ✓
              TYPE: sed:                                        ✓
              == REGEX FEATURE: [:digit:]
              TYPE: awk:                                        ✓
              TYPE: egrep:                                      ✓
              TYPE: ed:                                         ✓
              TYPE: emacs:                                      -
              TYPE: gnu-awk:                                    ✓
              TYPE: grep:                                       ✓
              TYPE: posix-awk:                                  ✓
              TYPE: posix-basic:                                ✓
              TYPE: posix-egrep:                                ✓
              TYPE: posix-extended:                             ✓
              TYPE: posix-minimal-basic:                        ✓
              TYPE: sed:                                        ✓
              == REGEX FEATURE: \d
              TYPE: awk:                                        -
              TYPE: egrep:                                      -
              TYPE: ed:                                         -
              TYPE: emacs:                                      -
              TYPE: gnu-awk:                                    -
              TYPE: grep:                                       -
              TYPE: posix-awk:                                  -
              TYPE: posix-basic:                                -
              TYPE: posix-egrep:                                -
              TYPE: posix-extended:                             -
              TYPE: posix-minimal-basic:                        -
              TYPE: sed:                                        -
              == REGEX FEATURE: \s
              TYPE: awk:                                        ✓
              TYPE: egrep:                                      ✓
              TYPE: ed:                                         -
              TYPE: emacs:                                      ✓
              TYPE: gnu-awk:                                    ✓
              TYPE: grep:                                       -
              TYPE: posix-awk:                                  ✓
              TYPE: posix-basic:                                -
              TYPE: posix-egrep:                                ✓
              TYPE: posix-extended:                             ✓
              TYPE: posix-minimal-basic:                        -
              TYPE: sed:                                        -
              =================== BSD find ===================
              == REGEX FEATURE: \{\}
              TYPE: basic:                                      ✓
              TYPE: extended:                                   -
              == REGEX FEATURE: {}
              TYPE: basic:                                      -
              TYPE: extended:                                   ✓
              == REGEX FEATURE: \+
              TYPE: basic:                                      -
              TYPE: extended:                                   -
              == REGEX FEATURE: +
              TYPE: basic:                                      -
              TYPE: extended:                                   ✓
              == REGEX FEATURE: \b
              TYPE: basic:                                      -
              TYPE: extended:                                   -
              == REGEX FEATURE: \< \>
              TYPE: basic:                                      -
              TYPE: extended:                                   -
              == REGEX FEATURE: [:digit:]
              TYPE: basic:                                      ✓
              TYPE: extended:                                   ✓
              == REGEX FEATURE: \d
              TYPE: basic:                                      -
              TYPE: extended:                                   -
              == REGEX FEATURE: \s
              TYPE: basic:                                      -
              TYPE: extended:                                   ✓
              

              这篇关于筛选名称为时间戳的文件夹-使用find实用工具进行模式匹配与正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆