筛选名称为时间戳的文件夹-使用find实用工具进行模式匹配与正则表达式匹配 [英] Filter folders whose name is a timestamp - pattern matching vs. regex matching using the find utility
问题描述
我正在编写一个通用的shell脚本,该脚本根据给定的正则表达式过滤掉文件.
I am writing a generic shell script which filters out files based on given regex.
我的shell脚本:
files=$(find $path -name $regex)
在一种情况下(要过滤),我想过滤目录中的文件夹,文件夹名称采用以下格式:
In one of the cases (to filter), I want to filter folders inside a directory, the name of the folders are in the below format:
20161128-20:34:33:432813246
YYYYMMDD-HH:MM:SS:NS
我无法找到正确的正则表达式.
I am unable to arrive at the correct regex.
我可以使用正则表达式'*data.txt'
获取文件夹中文件的路径,因为我知道其中的文件名.
I am able to get the path of the files inside the folder using the regex '*data.txt'
, as I know the name of the file inside it.
但是它为我提供了文件的完整路径,例如
But it gives me the full path of the file, something like
/path/20161128-20:34:33:432813246/data.txt
我想要的只是:
/path/20161128-20:34:33:432813246
请帮助我确定符合我要求的正确正则表达式
Please help me in identifying the correct regex for my requirement
注意:
我知道以后如何处理数据
I know how to process the data after
files=$(find $path -name $regex)
但是由于脚本在许多用例中都需要通用,因此我只需要正确的正则表达式即可.
But since the script needs to be generic for many use cases, I only need the correct regex that needs to be passed.
推荐答案
-
根据POSIX ,
find
-name
-path
主要(测试)使用 patterns (又称通配符表达式,glob)来匹配文件名和路径名(尽管模式和正则表达式之间有着遥远的联系,但它们的语法和功能却大不相同;简而言之:模式在语法上更简单,但功能远不如此.Per POSIX,
find
's-name
-path
primaries (tests) use patterns (a.k.a wildcard expressions, globs) to match filenames and pathnames (while patterns and regular expressions are distantly related, their syntax and capabilities differ significantly; in short: patterns are syntactically simpler, but far less powerful).-
-name
,并将模式与仅输入路径的 basename (仅文件名)部分匹配 -
-path
将模式与整个 pathname (完整路径)匹配
-name
and matches the pattern against the basename (mere filename) part of an input path only-path
matches the pattern against the whole pathname (the full path)
GNU和BSD/macOS
find
都实现了非标准扩展:Both GNU and BSD/macOS
find
implement nonstandard extensions:-
-iname
和-ipath
与它们的标准兼容版本(基于 patterns )相似,但区别在于不区分大小写. >
-
-regex
和-iregex
通过 regex (正则表达式)测试匹配的路径名.- 注意事项:两种实现都提供至少2种正则表达式方言供您选择(
-E
激活对BSDfind
中 extended 正则表达式的支持,而GNUfind
允许从以下几种方言中进行选择:-regextype
,但是在两种实现方式中,没有两种方言是完全相同的-请参见底部的血腥细节.
-iname
and-ipath
, which work like their standard-compliant counterparts (based on patterns), except that they match case-insensitively.-regex
and-iregex
tests for matching pathnames by regex (regular expression).- Caveat: Both implementations offer at least 2 regex dialects to choose from (
-E
activates support for extended regular expressions in BSDfind
, and GNUfind
allows selecting from several dialects with-regextype
, but no two dialects are exactly the same across the two implementations - see bottom for the gory details.
使用固定宽度命名方案的文件夹名称后,模式将起作用:
With your folder names following a fixed-width naming scheme, a pattern would work:
pattern='[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
当然,如果您不希望误报,可以采取捷径:
Of course, you can take a shortcut if you don't expect false positives:
pattern='[0-9]*-[0-9]?:[0-9]?:[0-9]?:[0-9]*'
请注意,与正则表达式不同,
*
和?
如何不是引用前一个表达式的重复符号(量词),而是本身表示任何字符序列(*
)或任何单个字符(?
).Note how
*
and?
, unlike in a regex, are not duplication symbols (quantifiers) that refer to the preceding expression, but by themselves represent any sequence of characters (*
) or any single character (?
).如果我们将它们放在一起:
If we put it all together:
files=$(find "$path" -type d -name "$pattern")
-
对变量引用进行双引号很重要,以保护其值免受不必要的shell扩展的影响,尤其是保留路径中的任何空格并防止shell过早地泛化 的值为
$pattern
.It's important to double-quote the variable references to protect their values from unwanted shell expansions, notably to preserve any whitespace in the path and to prevent premature globbing by the shell of value
$pattern
.请注意,我添加了
-type d
来限制与目录(文件夹)的匹配,从而提高了性能.Note that I've added
-type d
to limit matching to directories (folders), which improves performance.可选背景信息:
以下是 regex功能矩阵,从macOS 10.12.1上的GNU
find
v4.6.0/BSDfind
开始:Below is a regex feature matrix as of GNU
find
v4.6.0 / BSDfind
as found on macOS 10.12.1:-
GNU
find
功能按-regextype
选项支持的类型列出,默认值为emacs
.
GNU
find
features are listed by the types supported by the-regextype
option, withemacs
being the default.
- 请注意,几种
posix-*
命名的正则表达式类型是错误的名词,因为它们支持POSIX要求的功能 .
- Note that several
posix-*
-named regex types are misnomers in that they support features beyond what POSIX mandates.
BSD
find
功能由basic
列出(使用NO regex选项,这意味着平台风格的 ERE ).BSD
find
features are listed bybasic
(using NO regex option, which implies platform-flavored BREs) andextended
(using option-E
, which implies platform-flavored EREs).对于跨平台使用,请遵循 POSIX ERE( extended 正则表达式),同时在 GNU
find
中使用-regextype posix-extended
并在 BSDfind
中使用-E
是安全的,但请注意,并非您可能期望的所有功能都受支持,尤其是\b
,\<
/\>
和字符类快捷方式(例如\d
).For cross-platform use, sticking with POSIX EREs (extended regular expressions) while using
-regextype posix-extended
with GNUfind
and using-E
with BSDfind
is safe, but note that not all features you may expect will be supported, notably\b
,\<
/\>
and character class shortcuts such as\d
.=================== GNU find =================== == REGEX FEATURE: \{\} TYPE: awk: - TYPE: egrep: - TYPE: ed: ✓ TYPE: emacs: - TYPE: gnu-awk: - TYPE: grep: ✓ TYPE: posix-awk: - TYPE: posix-basic: ✓ TYPE: posix-egrep: - TYPE: posix-extended: - TYPE: posix-minimal-basic: ✓ TYPE: sed: ✓ == REGEX FEATURE: {} TYPE: awk: - TYPE: egrep: ✓ TYPE: ed: - TYPE: emacs: - TYPE: gnu-awk: ✓ TYPE: grep: - TYPE: posix-awk: ✓ TYPE: posix-basic: - TYPE: posix-egrep: ✓ TYPE: posix-extended: ✓ TYPE: posix-minimal-basic: - TYPE: sed: - == REGEX FEATURE: \+ TYPE: awk: - TYPE: egrep: - TYPE: ed: ✓ TYPE: emacs: - TYPE: gnu-awk: - TYPE: grep: ✓ TYPE: posix-awk: - TYPE: posix-basic: ✓ TYPE: posix-egrep: - TYPE: posix-extended: - TYPE: posix-minimal-basic: - TYPE: sed: ✓ == REGEX FEATURE: + TYPE: awk: ✓ TYPE: egrep: ✓ TYPE: ed: - TYPE: emacs: ✓ TYPE: gnu-awk: ✓ TYPE: grep: - TYPE: posix-awk: ✓ TYPE: posix-basic: - TYPE: posix-egrep: ✓ TYPE: posix-extended: ✓ TYPE: posix-minimal-basic: - TYPE: sed: - == REGEX FEATURE: \b TYPE: awk: - TYPE: egrep: ✓ TYPE: ed: ✓ TYPE: emacs: ✓ TYPE: gnu-awk: ✓ TYPE: grep: ✓ TYPE: posix-awk: - TYPE: posix-basic: ✓ TYPE: posix-egrep: ✓ TYPE: posix-extended: ✓ TYPE: posix-minimal-basic: ✓ TYPE: sed: ✓ == REGEX FEATURE: \< \> TYPE: awk: - TYPE: egrep: ✓ TYPE: ed: ✓ TYPE: emacs: ✓ TYPE: gnu-awk: ✓ TYPE: grep: ✓ TYPE: posix-awk: - TYPE: posix-basic: ✓ TYPE: posix-egrep: ✓ TYPE: posix-extended: ✓ TYPE: posix-minimal-basic: ✓ TYPE: sed: ✓ == REGEX FEATURE: [:digit:] TYPE: awk: ✓ TYPE: egrep: ✓ TYPE: ed: ✓ TYPE: emacs: - TYPE: gnu-awk: ✓ TYPE: grep: ✓ TYPE: posix-awk: ✓ TYPE: posix-basic: ✓ TYPE: posix-egrep: ✓ TYPE: posix-extended: ✓ TYPE: posix-minimal-basic: ✓ TYPE: sed: ✓ == REGEX FEATURE: \d TYPE: awk: - TYPE: egrep: - TYPE: ed: - TYPE: emacs: - TYPE: gnu-awk: - TYPE: grep: - TYPE: posix-awk: - TYPE: posix-basic: - TYPE: posix-egrep: - TYPE: posix-extended: - TYPE: posix-minimal-basic: - TYPE: sed: - == REGEX FEATURE: \s TYPE: awk: ✓ TYPE: egrep: ✓ TYPE: ed: - TYPE: emacs: ✓ TYPE: gnu-awk: ✓ TYPE: grep: - TYPE: posix-awk: ✓ TYPE: posix-basic: - TYPE: posix-egrep: ✓ TYPE: posix-extended: ✓ TYPE: posix-minimal-basic: - TYPE: sed: - =================== BSD find =================== == REGEX FEATURE: \{\} TYPE: basic: ✓ TYPE: extended: - == REGEX FEATURE: {} TYPE: basic: - TYPE: extended: ✓ == REGEX FEATURE: \+ TYPE: basic: - TYPE: extended: - == REGEX FEATURE: + TYPE: basic: - TYPE: extended: ✓ == REGEX FEATURE: \b TYPE: basic: - TYPE: extended: - == REGEX FEATURE: \< \> TYPE: basic: - TYPE: extended: - == REGEX FEATURE: [:digit:] TYPE: basic: ✓ TYPE: extended: ✓ == REGEX FEATURE: \d TYPE: basic: - TYPE: extended: - == REGEX FEATURE: \s TYPE: basic: - TYPE: extended: ✓
这篇关于筛选名称为时间戳的文件夹-使用find实用工具进行模式匹配与正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-
- Caveat: Both implementations offer at least 2 regex dialects to choose from (
- 注意事项:两种实现都提供至少2种正则表达式方言供您选择(
-