使用作为bash脚本参数传递的glob表达式 [英] Using a glob expression passed as a bash script argument

查看:56
本文介绍了使用作为bash脚本参数传递的glob表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么myscript具有var=$1且与./myscript带有var=foo*硬编码的调用相同,为什么不调用./myscript foo*?

Why isn't invoking ./myscript foo* when myscript has var=$1 the same as invoking ./myscript with var=foo* hardcoded?

我在编写的bash脚本中遇到了一个奇怪的问题.我敢肯定有一个简单的解释,但我无法弄清楚.

I've come across a weird issue in a bash script I'm writing. I am sure there is a simple explanation, but I can't figure it out.

我正在尝试传递一个命令行参数,以在脚本中将其分配为变量.

I am trying to pass a command line argument to be assigned as a variable in the script.

我希望脚本允许以下两个命令行参数:

I want the script to allow 2 command line arguments as follows:

$ bash my_bash_script.bash args1 args2

在我的脚本中,我分配了如下变量:

In my script, I assigned variables like this:

ARGS1=$1
ARGS2=$2

Args 1是要添加到输出文件中的字符串描述符.

Args 1 is a string descriptor to add to the output file.

Args 2是一组目录:"dir1,dir2,dir3",我将其作为dir*

Args 2 is a group of directories: "dir1, dir2, dir3", which I am passing as dir*

当我在脚本中将dir*分配给ARGS2时,它可以正常工作,但是当我将dir*作为第二个命令行参数传递时,它仅将dir1包含在dir*的通配符扩展中.

When I assign dir* to ARGS2 in the script it works fine, but when I pass dir* as the second command line argument, it only includes dir1 in the wildcard expansion of dir*.

我认为这与外壳如何处理通配符(即使以args形式传递)有关,但我不太了解.

I assume this has something to do with how the shell handles wildcards (even when passed as args), but I don't really understand it.

任何帮助将不胜感激.

我有一组目录:

dir_1_y_map, dir_1_x_map, dir_2_y_map, dir_2_x_map,
    ... dir_10_y_map, dir_10_x_map...

在这些目录中,我尝试通过*.status访问扩展名为".status"的文件,并通过*report.txt访问扩展名为".report.txt"的文件.

Inside these directories I am trying to access a file with extension ".status" via *.status, and ".report.txt" via *report.txt.

我想将dir_*_map作为第二个参数传递给脚本,并将其存储在变量ARGS2中,然后使用它在每个目录中搜索".status"".report"文件.

I want to pass dir_*_map as the second argument to the script and store it in the variable ARGS2, then use it to search within each of the directories for the ".status" and ".report" files.

问题在于,从命令行传递dir_*_map不会给出目录列表,而只会给出列表中的第一项.如果我在脚本中分配了变量ARGS2=dir_*_map,它将按预期工作.

The issue is that passing dir_*_map from the command line doesn't give the list of directories, but rather just the first item in the list. If I assign the variable ARGS2=dir_*_map within the script, it works as I intend.

事实证明,在引号中传递第二个参数可以使通配符扩展适用于"dir_*_map"

It turns out that passing the second argument in quotes allowed the wildcard expansion to work appropriately for "dir_*_map"

#!/usr/bin/env bash
ARGS1=$1    
ARGS2=$2

touch $ARGS1".extension"

for i in /$ARGS2/*.status
do
    grep -e "string" $i >> $ARGS1".extension"
done

这是脚本的示例调用:

sh ~/path/to/script descriptor "dir_*_map"

我不完全理解何时/为什么必须在引号中传递一些参数,但我认为这与for循环中的通配符扩展有关.

I don't fully understand when/why some arguments must be passed in quotes, but I assume it has to do with the wildcard expansion in the for loop.

推荐答案

解决为什么"

var=foo*中一样,

赋值不会扩展全局范围-也就是说,当您运行var=foo*时,文字字符串foo*会放入变量foo中,而不是与foo*.

Addressing the "why"

Assignments, as in var=foo*, don't expand globs -- that is, when you run var=foo*, the literal string foo* is put into the variable foo, not the list of files matching foo*.

相比之下,在命令行上不加引号地使用foo*会扩大全局范围,将其替换为单个名称列表,每个名称作为单独的参数传递..

By contrast, unquoted use of foo* on a command line expands the glob, replacing it with a list of individual names, each of which is passed as a separate argument.

因此,除非不存在与该glob表达式匹配的文件,否则运行./yourscript foo*不会将foo*作为$1传递;相反,它变成类似于./yourscript foo01 foo02 foo03的形式,每个参数都位于命令行的不同位置.

Thus, running ./yourscript foo* doesn't pass foo* as $1 unless no files matching that glob expression exist; instead, it becomes something like ./yourscript foo01 foo02 foo03, with each argument in a different spot on the command line.

运行./yourscript "foo*"用作变通办法的原因是脚本内部未引用的扩展允许在以后扩展glob.但是,这是一种不好的做法:全局扩展与字符串拆分同时发生(这意味着依靠此行为将删除您传递包含在IFS中找到的字符的文件名的能力,通常为空格),并且这也意味着您无法传递文字文件名,也可以将它们解释为glob(如果您有一个名为[1]的文件和一个名为1的文件,则传递[1]的文件将始终被替换为1).

The reason running ./yourscript "foo*" functions as a workaround is the unquoted expansion inside the script allowing the glob to be expanded at that later time. However, this is bad practice: glob expansion happens concurrent with string-splitting (meaning that relying on this behavior removes your ability to pass filenames containing characters found in IFS, typically whitespace), and also means that you can't pass literal filenames when they could also be interpreted as globs (if you have a file named [1] and a file named 1, passing [1] would always be replaced with 1).

构建此代码的惯用方式是shift删除第一个参数,然后遍历后续参数,例如:

The idiomatic way to build this would be to shift away the first argument, and then iterate over subsequent ones, like so:

#!/bin/bash
out_base=$1; shift

shopt -s nullglob                 # avoid generating an error if a directory has no .status

for dir; do                       # iterate over directories passed in $2, $3, etc
  for file in "$dir"/*.status; do # iterate over files ending in .status within those
      grep -e "string" "$file"    # match a single file
  done
done >"${out_base}.extension"


如果单个目录中有许多.status文件,则可以通过使用find调用带有尽可能多参数的grep来提高所有效率,而不是每次单独调用grep -文件基础:


If you have many .status files in a single directory, all this can be made more efficient by using find to invoke grep with as many arguments as possible, rather than calling grep individually on a per-file basis:

#!/bin/bash
out_base=$1; shift

find "$@" -maxdepth 1 -type f -name '*.status' \
  -exec grep -h -- /dev/null '{}' + \
  >"${out_base}.extension"


以上两个脚本都希望通过 not 的glob在调用shell上被引用.因此,用法的形式为:


Both scripts above expect the globs passed not to be quoted on the invoking shell. Thus, usage is of the form:

# being unquoted, this expands the glob into a series of separate arguments
your_script descriptor dir_*_map

这比将glob传递到脚本(然后将其扩展以检索要使用的实际文件)要好得多.它可以正确处理包含空格的文件名(其他做法则不这样),以及名称本身就是glob表达式的文件.

This is considerably better practice than passing globs to your script (which then is required to expand them to retrieve the actual files to use); it works correctly with filenames containing whitespace (which the other practice doesn't), and files whose names are themselves glob expressions.

其他一些要点:

  • 始终在扩展名两边加上双引号!否则,将导致附加的字符串拆分和全局扩展(按此顺序)步骤被应用.如果像"$dir"/*.status那样想要进行通配符,请在通配符表达式开始之前结束引号.
  • for dir; do完全等同于for dir in "$@"; do,后者会遍历参数.不要犯错使用for dir in $*; dofor dir in $@; do的错误!后面的这些调用将列表的每个元素与IFS的第一个字符(默认情况下,该字符按顺序包含空格,制表符和换行符)组合在一起,然后将结果字符串拆分为在其中找到的任何IFS字符,然后将结果列表的每个组成部分扩展为全域.
  • /dev/null作为参数传递给grep是一种安全措施:它确保您在单参数和多参数案例之间没有不同的行为(例如,grep默认为打印仅在传递了多个参数时才在输出中使用文件名),并确保如果没有传递任何其他文件名,则无法挂起grep尝试从stdin读取(find在这里不会做,但是xargs可以).
  • 为您自己的变量使用小写名称(与系统和外壳程序提供的变量全为大写字母相反)符合POSIX指定的约定;请参阅有关环境变量的POSIX规范的第四段,请牢记环境变量和外壳程序变量共享一个名称空间.
  • Always put double quotes around expansions! Failing to do so results in the additional steps of string-splitting and glob expansion (in that order) being applied. If you want globbing, as in the case of "$dir"/*.status, then end the quotes before the glob expression starts.
  • for dir; do is precisely equivalent to for dir in "$@"; do, which iterates over arguments. Don't make the mistake of using for dir in $*; do or for dir in $@; do instead! These latter invocations combine each element of the list with the first character of IFS (which, by default, contains the space, the tab and the newline in that order), then splits the resulting string on any IFS characters found within, then expands each component of the resulting list as a glob.
  • Passing /dev/null as an argument to grep is a safety measure: It ensures that you don't have different behavior between the single-argument and multi-argument cases (as an example, grep defaults to printing filenames within output only when passed multiple arguments), and ensures that you can't have grep hang trying to read from stdin if it's passed no additional filenames at all (which find won't do here, but xargs can).
  • Using lower-case names for your own variables (as opposed to system- and shell-provided variables, which have all-uppercase names) is in accordance with POSIX-specified convention; see fourth paragraph of the POSIX specification regarding environment variables, keeping in mind that environment variables and shell variables share a namespace.

这篇关于使用作为bash脚本参数传递的glob表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆