修改嵌套在 tar 存档中的文件 [英] Modifying files nested in tar archive

查看:52
本文介绍了修改嵌套在 tar 存档中的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试执行 grepsed 来搜索文件中的特定字符串,这些字符串位于多个 tar 中,所有这些都在一个主 tar 存档中.现在,我通过

I am trying to do a grep and then a sed to search for specific strings inside files, which are inside multiple tars, all inside one master tar archive. Right now, I modify the files by

  1. 首先提取主 tar 存档.
  2. 然后提取其中的所有焦油.
  3. 然后执行递归 grepsed 以替换文件中的特定字符串.
  4. 最后将所有内容再次打包到 tar 存档中,以及主存档中的所有存档.
  1. First extracting the master tar archive.
  2. Then extracting all the tars inside it.
  3. Then doing a recursive grep and then sed to replace a specific string in files.
  4. Finally packaging everything again into tar archives, and all the archives inside the master archive.

相当乏味.如何使用 shell 脚本自动执行此操作?

Pretty tedious. How do I do this automatically using shell scripting?

推荐答案

除了自动化您概述的步骤之外,没有太多选择,原因由 Kimvais.

There isn't going to be much option except automating the steps you outline, for the reasons demonstrated by the caveats in the answer by Kimvais.

tar 命令有一些选项可以修改现有的 tar 文件.但是,由于多种原因,它们不适合您的场景,其中之一是需要编辑的是嵌套的 tarball,而不是主 tarball.因此,您将不得不手工完成这项工作.

The tar command has some options to modify existing tar files. They are, however, not appropriate for your scenario for multiple reasons, one of them being that it is the nested tarballs that need editing rather than the master tarball. So, you will have to do the work longhand.

主存档中的所有存档是否都提取到当前目录或命名/创建的子目录中?也就是说,当你运行tar -tf master.tar.gz时,你看到:

Are all the archives in the master archive extracted into the current directory or into a named/created sub-directory? That is, when you run tar -tf master.tar.gz, do you see:

subdir-1.23/tarball1.tar
subdir-1.23/tarball2.tar
...

或者你看到:

tarball1.tar
tarball2.tar

(请注意,如果嵌套的 tar 文件要嵌入到更大的压缩 tarball 中,则不应对其本身进行 gzip 压缩.)

(Note that nested tars should not themselves be gzipped if they are to be embedded in a bigger compressed tarball.)

假设你有子目录符号,那么你可以这样做:

Assuming you have the subdirectory notation, then you can do:

for master in "$@"
do
    tmp=$(pwd)/xyz.$$
    trap "rm -fr $tmp; exit 1" 0 1 2 3 13 15
    cat $master |
    (
    mkdir $tmp
    cd $tmp
    tar -xf -
    cd *        # There is only one directory in the newly created one!
    process_tarballs *
    cd ..
    tar -czf - *   # There is only one directory down here
    ) > new.$master
    rm -fr $tmp
    trap 0
done

如果您在恶意环境中工作,请使用 tmp.$$ 以外的其他内容作为目录名称.但是,这种重新打包通常不会在恶意环境中进行,根据进程 ID 选择的名称足以为所有内容提供唯一名称.使用 tar -f - 进行输入和输出允许您切换目录,但仍然在命令行上处理相对路径名.如果您愿意,可能还有其他方法可以处理.我还使用 cat 将输入提供给子外壳,以便从上到下的流程清晰;从技术上讲,我可以通过使用 ) >new.$master <$master 结尾,但这会在多行之后隐藏一些关键信息.

If you're working in a malicious environment, use something other than tmp.$$ for the directory name. However, this sort of repackaging is usually not done in a malicious environment, and the chosen name based on process ID is sufficient to give everything a unique name. The use of tar -f - for input and output allows you to switch directories but still handle relative pathnames on the command line. There are likely other ways to handle that if you want. I also used cat to feed the input to the sub-shell so that the top-to-bottom flow is clear; technically, I could improve things by using ) > new.$master < $master at the end, but that hides some crucial information multiple lines later.

trap 命令确保 (a) 如果脚本被中断(发出 HUP、INT、QUIT、PIPE 或 TERM 信号),临时目录被删除并且退出状态为 1(不成功)和(b)一次子目录被删除,进程可以以零状态退出.

The trap commands make sure that (a) if the script is interrupted (signals HUP, INT, QUIT, PIPE or TERM), the temporary directory is removed and the exit status is 1 (not success) and (b) once the subdirectory is removed, the process can exit with a zero status.

您可能需要在覆盖之前检查 new.$master 是否存在.您可能需要检查提取操作是否实际提取了内容.您可能需要检查子 tarball 处理是否确实有效.如果主 tarball 解压到多个子目录中,您需要将 'cd *' 行转换为某个循环,循环遍历它创建的子目录.

You might need to check whether new.$master exists before overwriting it. You might need to check that the extract operation actually extracted stuff. You might need to check whether the sub-tarball processing actually worked. If the master tarball extracts into multiple sub-directories, you need to convert the 'cd *' line into some loop that iterates over the sub-directories it creates.

如果您对内容足够了解并且没有出错,所有这些问题都可以跳过.

All these issues can be skipped if you know enough about the contents and nothing goes wrong.

第二个脚本是process_tarballs;它依次在其命令行上处理每个 tarball,提取文件,进行替换,重新打包结果等. 使用两个脚本的一个优点是您可以独立于处理一个更大的任务来测试 tarball 处理包含多个 tarball 的 tarball.同样,如果每个子 tarball 都提取到自己的子目录中,生活会容易得多;如果其中任何一个提取到当前目录中,请确保为其创建一个新的子目录.

The second script is process_tarballs; it processes each of the tarballs on its command line in turn, extracting the file, making the substitutions, repackaging the result, etc. One advantage of using two scripts is that you can test the tarball processing separately from the bigger task of dealing with a tarball containing multiple tarballs. Again, life will be much easier if each of the sub-tarballs extracts into its own sub-directory; if any of them extracts into the current directory, make sure you create a new sub-directory for it.

for tarball in "$@"
do
    # Extract $tarball into sub-directory
    tar -xf $tarball
    # Locate appropriate sub-directory.
    (
    cd $subdirectory
    find . -type f -print0 | xargs -0 sed -i 's/name/alternative-name/g'
    )
    mv $tarball old.$tarball
    tar -cf $tarball $subdirectory
    rm -f old.$tarball
done

你也应该在这里添加陷阱来清理,这样脚本可以与上面的主脚本隔离运行,并且仍然不会留下任何中间目录.在外部脚本的上下文中,您可能不需要在创建新的 tarball 之前如此小心地保留旧的 tarball(因此 rm -f $tarbal 而不是移动和删除命令),但是就其本身而言,脚本应注意不要损坏任何东西.

You should add traps to clean up here, too, so the script can be run in isolation from the master script above and still not leave any intermediate directories around. In the context of the outer script, you might not need to be so careful to preserve the old tarball before the new is created (so rm -f $tarbal instead of the move and remove command), but treated in its own right, the script should be careful not to damage anything.

  • 您的尝试并非微不足道.
  • 可调试性将作业拆分为两个可以独立测试的脚本.
  • 当您知道文件中的真实内容时,处理极端情况会容易得多.

这篇关于修改嵌套在 tar 存档中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆